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Commissioner for Patents 

P.O. Box 1450 

Alexandria, VA 22313-1450 



RECEIVED 

DEC 1 0 iOO'* 

oFRCEOFPennoHS 



Writer 's Direct Number: 

(202) 772-8629 

Internet Address: 

D0NF@SKGF.COM 

Mail Stop Petition 



Re: 



U.S. Utility Patent Application 

Application No. 09/662,832; Filed: September 15, 2000 
For: Alignment and Ordering of Vector Elements for Single Instruction 
Multiple Data Processing 

Inventors: Van Hook et al 

OurRef: 1778.0100002 — ^ 



o 
m 



Sir: 



o 



Transmitted herewith for appropriate action are the following documents: 

1. Petition for an Extension of Time Under 37 C.F.R. § 1.136(a)(1);' ik 

2. Fee Transmittal (PTO/SB/1 7); 

3. Request for Reconsideration of Petition Under 37 C.F.R. § 1 .47(a); 

4. Statement of Facts in Support of Filing On Behalf of Non-Signing 
Inventor Under 37 C.F.R. § 1.47(a), including the following Exhibits: 

A. Exhibit A, consisting of: 

a. Copy of a letter sent to Timothy J. Van Hook on June 3, 2004; 

b. Copy of a FedEx Shipping Label, showing the tracking number 
of the package sent to Timothy Van Hook on June 3, 2004; 

c. Copy of a self-addresses stamped envelope as sent to Timothy 
J. Van Hook on June 3, 2004; 

d. Copy of a Supplemental Declaration for Patent Application as 
sent to Timothy J. Van Hook on June 3, 2004; 



m 
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e. Copy of U.S. Utility Patent Appl. No. 09/662,832 as filed on 
September 15, 2000 and sent to Timothy J. Van Hook on June 
3, 2004; 

f. Copy of a Preliminary Amendment filed September 15, 2000, 
as sent to Timothy J. Van Hook on June 3, 2004; 

g. Copy of a Second Preliminary Amendment filed April 9, 2001, 
as sent to Timothy J. Van Hook on June 3, 2004; 

h. Copy of an Amendment and Reply Under 37 C.F.R. § 1.111 
filed January 24, 2002, as sent to Timothy J. Van Hook on June 
3, 2004; and 

i. Copy of a Preliminary Amendment Under 37 C.F.R. §1.114 
filed May 22, 2002, as sent to Timothy J. Van Hook on June 3, 
2004; 

B. Exhibit B, consisting of an email from FedEx dated June 4, 2004, 
confirming delivery of the FedEx Shipment to Timothy J. Van 
Hook on June 4, 2004; 

C. Exhibit C, consisting of: 

a. Copy of a letter sent to Timothy J. Van Hook on September 22, 
2004; 

b. Copy of a FedEx Shipping Label, showing the tracking number 
of the package sent to Timothy J. Van Hook on September 22, 
2004; 

c. Copy of a letter sent to Peter Yan-Tek Hsu on September 22, 
2004; 

d. Copy of a FedEx Shipping Label, showing the tracking number 
of the package sent to Peter Yan-Tek Hsu on September 22, 
2004; 

e. Copy of a letter sent to William A. Huffman on September 22, 
2004; 

f Copy of a FedEx Shipping Label, showing the tracking number 
of the package sent to William A. Huffinan on September 22, 
2004; 
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g. Copy of a letter sent to Henry P. Moreton on September 22, 
2004; 

h. Copy of a FedEx Shipping Label, showing the tracking number 
of the package sent to Henry P. Moreton on September 22, 
2004; 

i. Copy of a letter sent to Earl A. Killian on September 22, 2004; 

j. Copy of a FedEx Shipping Label, showing the tracking number 
of the package sent to Earl A. Killian on September 22, 2004; 

k. Copy of U.S. Utility Patent Appl. No. 09/662,832 as filed on 
September 15, 2000, and sent to the inventors on September 
22, 2004; 

1. Copy of U.S. Utility Patent No. 5,933,650, issued August 3, 
1999, as sent to the inventors on September 22, 2004; 

m. Copy of U.S. Utility Patent No. 6,266,758 Bl, issued July 24, 
2001, as sent to the inventors on September 22, 2004; 

n. Copy of a list of allowed claims for U.S. Utility Application 
No. 09/662,832, filed September 15, 2000, as sent to the 
inventors on September 22, 2004; 

o. Copy of an original executed Declaration and Power of 
Attorney for a Patent Application as filed in grandparent U.S. 
Utility Patent Application No. 08/947,649 on July 16, 1998 (in 
five parts), as sent to the inventors on September 22, 2004; 

p. Copy of a Declaration for Patent Application for U.S. Utility 
Patent No. 5,933,650, issued August 3, 1999, as sent to the 
inventors on September 22, 2004; 

q. Copy of a Declaration for Patent Application for U.S. Utility 
Patent No. 6,266,758 Bl, issued July 24, 2001, as sent to the 
inventors on September 22, 2004; 

r. Copy of a Declaration for Patent Application for U.S. Utility 
Application No. 09/662,832, filed September 15, 2000, as sent 
to the inventors on September 22, 2004; and 
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s. Copy of 37 C.F.R. 10.18(b) and (c): Effect of Signature and 
Certificate for Correspondence Filed in the Patent and 
Trademark Office as sent to the inventors on September, 22, 
2004; 

D. Exhibit D, consisting of an email from FedEx dated September 23, 
2004, confirming delivery of the FedEx Shipment to Henry P. 
Moreton on September 23, 2004; 

E. Exhibit E, consisting of an email from FedEx dated September 23, 
2004, confirming delivery of the FedEx Shipment to Timothy J. 
Van Hook on September 23, 2004; 

F. Exhibit F, consisting of an email from FedEx dated September 23, 
2004, confirming delivery of the FedEx Shipment to William A. 
Huffinan on September 23, 2004; 

G. Exhibit G, consisting of an email from FedEx dated September 23, 
2004, confirming delivery of the FedEx Shipment to Earl A. 
Killian on September 23, 2004; 

H. Exhibit H, consisting of an email from FedEx dated September 23, 
2004, confirming delivery of the FedEx Shipment to Peter Yan- 
Tek Hsu on September 27, 2004; 

I. Exhibit I, consisting of an email sent by LuAnne M. DeSantis, Esq. 
to Peter Yan-Tek Hsu on October 20, 2004; 

J. Exhibit J, consisting of an email sent by LuAnne M. DeSantis, 
Esq. to Peter Yan-Tek Hsu on November 30, 2004; 

K. Exhibit K, consisting of an email sent by LuAnne M. DeSantis, 
Esq. to WiUiam A. Huffman on October 20, 2004; 

L. Exhibit L, consisting of an email sent by LuAnne M. DeSantis, 
Esq. to Williams A. Huffrnan on November 30, 2004; 

M. Exhibit M, consisting of a letter sent by William A. Huffinan to 
Donald J. Featherstone dated December 7, 2004; 

N. Exhibit N, consisting of an email sent by LuAnne M. DeSantis, 
Esq. to Earl A. Killian on October 25, 2004; 
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O. Exhibit O, consisting of an email sent by LuAnne M. DeSantis, 
Esq. to Earl A. Killian on November 30, 2004; and 

P. Exhibit P, consisting of an email sent by Earl A. Killian to LuAnne 
M. DeSantis, Esq. on December 1, 2004; 

5. Copy of an original Declaration for Patent Application executed by Henry 
P. Moreton; 

6. Credit Card Payment Form (PTO-2038) for $2360.00 to cover: 

$2160.00 Extension of Time Under 37 C.F.R. § 1.136(a)(1); and 
$ 200.00 Petition Fee Under 37 C.F.R. § 1.1 7(g); and 

7. One (1) return postcard. 

It is respectfully requested that the attached postcard be stamped with the date of filing of 
these documents, and that it be retumed to our courier. In the event that additional extensions of 
time are necessary to prevent abandonment of this patent application, then such extensions of 
time are hereby petitioned. 

The U.S. Patent and Trademark Office is hereby authorized to charge any fee deficiency, 
or credit any overpayment, to our Deposit Account No. 19-0036. 



Respectfully submitted. 




Stern£^ssler, Goldstein & Fox p.l.l.c. 



Donald J. Featherstone 
Attorney for Applicants 
Registration No. 33,876 
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In re application of: 

Van Hook et al 
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For: Alignment and Ordering of Vector 
Elements for Single Instruction 
Multiple Data Processing 



Statement Of Facts In Support of Filing OnTStirairunMon-Signmg 
Inventors Under 37 C.F.R. § 1.47(a) 



Confirmation No.: 
Art Unit: 
Examiner: 
Atty. Docket: 
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RECEIVED 

DEC 2 0 2004 



Commissioner for Patents 

P.O. Box 1450 

Alexandria, VA 22313-1450 



Mail Stop Petition 



Sir: 



I, LuAnne M. DeSantis, Esq., hereby declare: 



1 . I am making this statement of facts in support of filing on behalf of non- 
signing inventors under 37 C.F.R. § 1.47(a) with regard to U.S. Non-Provisional Patent 
Application No. 09/662,832, filed September 15, 2000 ("the '832 patent applicafion"). 

2. I am employed at the lav^ firm of Sterne, Kessler, Goldstein & Fox P.L.L.C. 
("SKGF"), 1 100 New York Avenue, N.W., Washington, D.C. 20005-3934. 



3. Mr. Timothy J. Van Hook ("Mr. Van Hook") is an inventor named in the '832 

patent application. His last known address as of September 23, 2004, is as follows: 

224 Oakgrove Avenue 
Atherton, CA 94027 

4. Mr. Peter Yan-Tek Hsu ("Mr. Hsu") is an inventor named in the '832 patent 

application. His last known address as of September 27, 2004, is as follows: 

1 Rausch Street, Unit F 
San Francisco, CA 94103 
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5. Mr. William A. Huffman ("Mr. Huffman") is an inventor named in the '832 

patent application. His last known address as of September 23, 2004, is as follows: 

16205 Roseleaf Lane 
Los Gatos, CA 95032 

6. Mr. Earl A. Killian ("Mr. Killian") is an inventor named in the '832 patent 

application. His last known address as of September 23, 2004, is as follows: 

27961 Central Drive 

Los Altos Hills, CA 94022 

7. The invention(s) disclosed and/or claimed in the above-identified patent 
application were made while Mr. Van Hook, Mr. Hsu, Mr. Huffhian, and Mr. Killian 
(collectively, "the non-signing inventors") were employed by Silicon Graphics, Inc. ("SGI"), 
2011 N. Shoreline Boulevard, Mountain View, Cahfomia, 94043-1389. The *832 patent 
application is now assigned to MIPS Technologies, Inc. ("MIPS"). The non-signing 
inventors are not currently employed at either SGI or MIPS. 

8. The '832 patent application is a continuation of U.S. Non-Provisional Patent 
Application No. 09/263,798 ("the 798 patent application"), filed March 5, 1999 (now U.S. 
Patent No. 6,266,758), which is a continuation of U.S. Non-Provisional Patent Application 
No. 08/947,649 ("the '649 patent application"), filed October 9, 1997 (now U.S. Pat. No. 
5,933,650). 

9. According to SKGF records, a Declaration was signed for the *649 patent 
application by all inventors except for Mr. Van Hook. The attorney that filed the application, 
James P. Hao, Esq., signed the Declaration in place of Mr. Van Hook and mistakenly filed it 
under 37 C.F.R. § 1.47(b) in response to a Notice to File Missing Parts of Application (on 
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July 16, 1998). The Declaration was filed along with its fee, a Petition Form for Signature by 
Person with Sufficient Proprietary Interest on Behalf of Omitted hiventor(s) Who Refuse(s) 
to Sign or Cannot Be Reached (37 CFR 1.47(b)) and a Statement of Facts in Support of Filing 
on Behalf of Nonsigning Inventor Under 37 C.F.R. § 1.47(b). Responsibility, as well as the 
physical files, for the *649, the '798, and the '832 patent applications were subsequently 
transferred to SKGF. According to SKGF records, no decision was ever received from the 
USPTO. (Also note that an unsigned Supplemental Declaration and Power of Attomey for a 
Patent Application was subsequently filed in the '649 patent application on January 6, 1999, 
to show the citizenship of all inventors as required in an Office Action dated October 9, 
1998.) 

10. According to SKGF records, the '798 patent application, which claims priority 
to the '649 patent application, was filed with the Declaration that was filed in the '649 patent 
application on July 16, 1998. This Declaration was also filed with the Petition Form and 
Statement of Facts that was filed in the '649 patent application. According to SKGF records, 
no decision was ever received from the USPTO. 

11. According to SKGF records, the '832 patent application was filed with the 
Declaration that was filed in the '649 patent application on July 16, 1998. The Declaration 
issue was discovered when reviewing the '832 patent application file in preparation for 
paying the Issue Fee. 

12. During the period of April 15 through May 20, 2004, SKGF made multiple 
telephone calls to the Office of Petitions in attempts to get guidance on what action should be 
taken to correct any deficiencies in the originally filed Declaration in the '832 application and 
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in a similarly situated, commonly owned application. We initially sought to obtain only the 
Declaration signature of Mr. Van Hook. However, during a telephone call on June 15, 2004, 
Ms. Alesia M. Brown of the Office of Petitions recommended that a new Declaration be 
executed by all inventors listed on the '832 application. Our first attempts to get only Mr. 
Van Hook's signature and our subsequent attempts to get signatures firom all inventors are 
detailed in the following paragraphs. 

13. On the evening of June 2, 2004, I spoke with Mr. Van Hook via telephone. I 
explained the situation and asked if I could verify his mailing address. He verified that his 
mailing address was as indicated above in paragraph number 3. Mr. Van Hook stated that, 
due to the history between his former employer and himself, he may or may not open any 
package sent to him on behalf of our client, and that he may or may not sign or return 
anything. He explained, in some detail, that a lawsuit related to intellectual property and/or 
trade secrets had been brought against him in the past by his former employer. The lawsuit 
had since been dropped. However, Mr. Van Hook stated that he would not sign anything 
without a release stating that suit will never be brought against him again. 

14. On June 3, 2004, SKGF sent a package via Federal Express to Mr. Van Hook 
that included a letter signed by Mr. Donald J. Featherstone, Esq. of SKGF, a copy of the '832 
patent application, copies of subsequent amendments made to the '832 patent application, and 
a Supplemental Declaration (see EXHIBIT A). 

15. On June 4, 2004, SKGF received email confirmation firom Federal Express 
that the package sent on June 3, 2004, to Mr. Van Hook was received and signed for by 
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".VANHOOK." (see EXHIBIT B). No response to the package was ever received from Mr. 
Van Hook. 

16. On September 1, 2004, I left a voicemail message for Mr. Henry P. Moreton 
("Mr. Moreton," a co-inventor), briefly explaining the situation and asking for a return call. 
No return call was received. 

17. On September 1, 2004, I spoke with Mr. Hsu via telephone and briefly 
explained the situation. Mr, Hsu verified that his current mailing address was as indicated 
above in paragraph number 4. On September 2, 2004, 1 emailed Mr. Hsu to verify his correct 
full name. He responded via email the same day with an explanation of his full name. 

18. On September 1, 2004, I left a voicemail message for Mr. Huffman, briefly 
explaining the situation and asking for a return call. On September 2, 2004, I received an 
email from Jeannette Schreckenghaust at SGI (Mr. Huffman*s former employer), forwarding 
an email from Mr. Huffman asking if she was aware of the situation and asking if she was 
famiUar with the requesting law firm (SKGF). On September 9, 2004, Mr. Huffman 
telephoned me and verified that his current mailing address was as indicated above in 
paragraph number 5. 

19. On September 1, 2004, I left a voicemail message for Mr. Killian, briefly 
explaining the situation and asking for a return call. No return call was received. 

20. On September 22, 2004, SKGF sent a package via Federal Express to all of the 
inventors (Mr. Van Hook, Mr. Hsu, Mr. Huffman, Mr. Killian, and Mr. Moreton). The 
package included a letter signed by Mr. Donald J. Featherstone, Esq. of SKGF, copies of the 
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758 and '650 patents, a copy of the '832 patent application and currently allowed claims, a 
copy of the original Declaration from the '649 patent application, three new Declarations (one 
for each of the '649, '798, and '832 patent applications), and a copy of 37 C.F.R. § 10.18(b) 
and (c) (see EXHIBIT C). 

21. On September 23, 2004, SKGF received emails from Federal Express 
confirming that the packages sent on September 22, 2004, were delivered to Mr. Moreton 
(signed for by "K.GREEN.", see EXHIBIT D), Mr. Van Hook (signed for by 
".VANHOOK.", see EXHIBIT E), Mr. Huffinan (signed for by "W..HUFFMAN.", see 
EXHIBIT F), and Mr. Killian (signed for by "L.EE.", see EXHIBIT G). 

22. On September 27, 2004, SKGF received email confirmation from Federal 
Express that the package sent on September 22, 2004, was delivered to Mr. Hsu (signed for 
by "JTRACY.", see EXHIBIT H). 

23. On October 7, 2004, newly executed Declarations were received from Mr. 
Moreton. 

24. On October 20, 2004, I sent a follow-up email to Mr. Hsu (see EXHIBIT I). 
No response was received. On October 29, 2004, I left a voicemail message for Mr. Hsu 
asking him to return the call. No return call was received. On November 30, 2004, I sent 
another follow-up email (see EXHIBIT J) to Mr. Hsu that includes the statement, "If we do 
not receive a response from you by Monday, December 13, 2004, we will assume that you 
refuse to cooperate in prosecuting/maintaining the above-captioned patent 
application/patents." No response was received. I have not received any further 
communications from Mr. Hsu. 
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25. On October 20, 2004, I sent a follow-up email to Mr. Huffman (see EXHIBIT 
K). No response was received. On October 29, 2004, I telephoned Mr. Huffman and was 
asked by the female person who answered the telephone to call back in a few minutes 
because Mr. Huffman was indisposed. Upon returning the call, I was told by the same female 
person that he had since stepped out and that she would give him a message to call me. On 
October 30, 2004, Mr. Huffman left a message on my voicemail stating that he had not yet 
looked at the package and therefore had nothing to report to me at the time. On November 
30, 2004, I sent another follow-up email to Mr, Huffman (see EXHIBIT L) that includes the 
statement, "If we do not receive a response from you by Monday, December 13, 2004, we 
will assume that you refuse to cooperate in prosecuting/maintaining the above-captioned 
patent application/patents." On December 10, 2004, a letter addressed to Donald J. 
Featherstone fi*om William A. Huffman was received by SKGF (see EXHIBIT M), along 
with a newly executed Declaration for the grandparent application (the '649 patent 
application, now U.S. Patent No. 5,933,650), Mr. Huffman stated in his letter that he did not 
have time to review the 758 patent or '832 patent application and therefore would be "unable 
to assist you [SKGF] with the two additional declarations." I have not received any further 
communications from Mr. Huffman. 

26. On October 25, 2004, I sent a follow-up email to Mr. Killian (see EXHIBIT 
N). No response was received. On October 29, 2004, 1 spoke with Mr. Killian via telephone 
and explained the situation briefly. I asked him if he had any questions regarding the 
package sent on September 22, 2004. Mr. Killian stated that he would consider looking at the 
package and would call back if he had any questions. He expressed hesitation signing or 
certifying anything that he had worked on such a long time ago in his career. On November 
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30, 2004, I sent another follow-up email to Mr. Killian (see EXHIBIT O) that includes the 
statement, "If we do not receive a response from you by Monday, December 13, 2004, we 
will assume that you refuse to cooperate in prosecuting/maintaining the above-captioned 
patent application/patents." Mr. Killian responded by email on December 1, 2004, indicating 
that he believes reviewing the documentation is not currently a good use of his time and that 
he will not be signing the Declarations (see EXHIBIT P). I have not received any further 
communications from Mr. Killian. 

27. On October 29, 2004, 1 telephoned Mr. Van Hook and left a message asking if 
he had any questions regarding the package sent on September 22, 2004. He returned the call 
the same day. I explained the situation again briefly. He stated that he had received the 
package, but again reftised to review or sign anything. I have not received any further 
communications from Mr. Van Hook. 

I declare that all statements made herein of my own knowledge are true and that all 
statements made on information from review of the file history of the patent application are 
believed to be true, and further that these statements were made with the knowledge that 
willful false statements or the like so made are punishable by fine or imprisonment or both 
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under Section 1001 of Title 18 of the United States Code, and that such willful false 
statements may jeopardize the validity of the patent application or any patent issued thereon. 

Respectfully submitted, 

Sterne, Kessler, Goldstein & Fox p.l.l.c. 

LuAnne M. DeSantis 
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Petition for Extension of Time Under 37 C.F.R. § 1.136(a)(1) 



Commissioner for Patents Dp* 

P.O. Box 1450 ^^OBlVPn 

Alexandria, VA 22313-1450 „^ 

DEC 2 0 ?0(i4 

It is hereby requested that the period for replying to the Decision Refusing Status 
Under 37 C.F.R. § 1.47(a) be extended five months from July 14, 2004 to December 14, 2004 
by the filing of this Petition and fee payment. 

The petition fee set forth in 37 C.F.R. § 1.17(a) is beheved to be $2,160.00 for a five- 
month extension of time for a large entity. Fee payment is provided in our accompanying 
Credit Card Payment Form (PTO-2038). However, if extensions of time under 37 C.F.R. § 
1.136 other than those provided herewith are required to prevent abandonment of the present 
patent application, then such extensions of time are hereby petitioned. 

The U.S. Patent and Trademark Office is hereby authorized to charge any fee 
deficiency, or credit any overpayment, to our Deposit Account No. 19-0036. 

Respectfixlly submitted, 

Sterne JgissLER, Goldstein & Fox p.l.l.c. 
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Date: 



Donald J. FeatKerstone 
Attorney for Applicants 
Registration No. 33,876 



1 100 New York Avenue, N.W. 
Washington, D.C. 20005-3934 
(202)371-2600 
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PTO/SB/17(10-04v2) 
P^f^PrOPTITTfffS^- 0651-0032 
U.S. Patent and TradMfaWMfeAATsroePKCTlf^^ OF COMMERCE 
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FEE TRAN 



TAL 



for FY 2005 

Effective 10/01/2004. Patent fees are subject to annual revision. 



□ Applicant claims small entity status. See 37 CFR 1 .27 



TOTAL AMOUNT OF PAYMENT 



{$) 



2360.00 



Complete if Known 



Application Number 



Filing Date 



First Named Inventor 



Examiner Name 



Art Unit 



Attorney Docket No. 



09/662,832 



September 15, 2000 



Timothy J. Van Hook 



Pan, Daniel H. 



2183 



1778.0100002 



METHOD OF PAYMENT (check all that apply) 



FEE CALCULATION (continued) 



□ Check [x] Credit card Q Money [x] Other None 

□**Charge any deficiencies or crHnt any overpayments in 
Deposit Account: the fees t o Deposit Acct. No. 19-003 6. 



Deposit 
Account 
Number 
Deposit 
Account 
Name 



19-0036 



Sterne, Kessler, Goldstein & Fox P.L.L.C. 



The Director is authorized to: (check all that apply) 

n Charge fee(s) indicated below Credit any overpayments 

n Charge any additional fee(s) or any underpayment of fee(s) 

[jj] Charge fee(s) indicated below, except for the filing fee 

to the above-identified deposit account. 



FEE CALCULATION 



1. BASIC FILING FEE 

l-arge Entity Small Entity 
Fee Fee 
Code ($) 



Fee Fee 
Code ($) 



1001 790 

1002 350 

1003 550 

1004 790 

1005 160 



2001 395 

2002 175 

2003 275 

2004 395 

2005 80 



Fee Description 

Utility filing fee 
Design filing fee 
Plant filing fee 
Reissue filing fee 
Provisional filing fee 

SUBTOTAL (1) 



Fee Paid 



($) 



0.00 



2. EXTRA CLAIM FEES FOR UTILITY AND REISSUE 

Fee from 

Ext ra Clalni s below Fee Paid 



Total Claims | 

Independent I 
Claims I— 
Multiple Dependent 



D -20** 
] -3** 



= [ 



[ZZIx[ 



DC 



Large Entity 


Smali Entitv 


Fee Fee 
Code ($) 


Fee Fee 
Code ($) 


1202 18 


2202 9 


1201 88 


2201 44 


1203 300 


2203 150 


1204 88 


2204 44 


1205 18 


2205 9 



Fee Description 

Claims in excess of 20 

Independent claims in excess of 3 

Multiple dependent claim, if not paid 

** Reissue independent claims 
over original patent 

** Reissue claims in excess of 20 
and over original patent 



SUBTOTAL (2) 



1$L 



0.00 



*^onnjmbe^mvi^^ 



3. ADDITIONAL FEES 




l^rae Entitv 


Small Entitv 






Fee 
Code 


Fee 
($) 


Fee 
Code 


Fee 
($) 


Fee Description 




1051 


130 


2051 


65 


Surcharge - late filing fee or oath 




1052 


50 


2052 


25 


Surcharge - late provisional filing fee or 
cover sheet 


1053 


130 


1053 


130 


Non-English specification 




lO 1^ 




1812 2,520 


For filing a request for ex parte reexamination 


1804 


920* 


1804 


920* 


Requesting publication of SIR prior to 
Examiner action 


1805 


1,840* 


1805 1.840* 


Requesting publication of SIR after 
Examiner action 


1251 


110 


2251 


55 


Extension for reply within first month 


1252 


430 


2252 


215 


Extension for reply within second month 


1253 


980 


2253 


490 


Extension for reply within third month 


1254 


1.530 


2254 


765 


Extension for reply within fourth month 


1255 


2.080 


2255 


1.040 


Extension for reply within fifth month 


1401 




2401 


170 


Notice of Appeal 




1402 


340 


2402 


170 


Filing a brief in support of an appeal 


1403 


3bo 


2403 


150 


Request for oral hearing 




1451 


1.510 


1451 


1,510 


Petition to institute a public use proceeding 


1452 


110 


2452 


55 


Petition to revive - unavoidable 




1453 


1,370 


2453 


685 


Petition to revive - unintentional 




1501 


1.370 


2501 


685 


Utility issue fee (or reissue) 




1502 


490 


2502 


245 


Design issue fee 




1503 


660 


2503 


330 


Plant issue fee 




1460 


130 


1460 


130 


Petitions to the Commissioner 




1807 


50 


1807 


50 


Processing fee under 37 CFR 1.1 7(q) 


1806 


180 


1806 


180 


Submission of Information Disclosure Stmt 


8021 


40 


8021 


40 


Recording each patent assignment per 
property (times number of properties) 


1809 


790 


2809 


395 


Filing a submission after final rejection 
(37 CFR 1.129(a)) 


1810 


790 


2810 


395 


For each additional invention to be 
examined (37 CFR 1.129(b)) 




1801 


790 


2801 


395 


Request for Continued Examination (RCE) 


1802 


900 


1802 


900 


Request for expedited examination 
of a design application 


Other fee (specify) 








'Reduced by Basic Filing Fee Paid SUBTOTAL (3) 


m 



"ee Paid 



2160.00 



200.00 



2360.00 



SUBMITTED BY 



(Complete (if applicable)) 



Name (Print/Type) 



Donald,JrKeath#rslane 



Registration No. 
(Attomev/Aaent) 



33,876 



Telephone (202) 371-2600 



^ignature 
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Mation on this form may become public. Credit card Information should not 
s form. Provide credit card information and authorization on PTO-2038. 

This collection of information is required by 37 CFR 1.17 and 1.27. The information is required to obtain or retain a benefit by the public which is to file (and by the 
USPTO to process) an application. Confidentiality is governed by 35 U.S.C. 122 and 37 CFR 1.14. This collection is estimated to take 12 minutes to complete, 
including gathering, preparing, and submitting the completed application form to the USPTO. Time will vary depending upon the individual case. Any comments on 
the amount of time you require to complete this form and/or suggestions for reducing this burden, should be sent to the Chief Information Officer. U.S. Patent and 
Trademark Office, U.S. Department of Commerce. P.O. Box 1450, Alexandria. VA 22313-1450. DO NOT SEND FEES OR COMPLETED FORMS TO THIS ADDRESS. 
SEND TO: Commissioner for Patents, P.O. Box 1450, Alexandria. VA 22313-1450. 

If you need assistance in completing the form, call I'SOO-PTOQIQQ and select option 2. 



IN THE 




,S PATENT AND TRADEMARK OFFICE 



In re application of: 



Confirmation No.: 



2552 



Van Hook et al 



Art Unit: 



2183 




Elements for Single Instruction ^ 



Multiple Data Processing 




Request for Reconsideration of Petition Under 37 C.F.R. § 1.47(a) 



Commissioner for Patents 

P.O. Box 1450 

Alexandria, VA 22313-1450 



Mail Stop Petition 



Sir: 



In response to the Decision Refusing Status Under 37 CFR 1.47(a) mailed May 14, 
2004, and in accordance with the requirements of 37 C.F.R. § 1.47(a) and M.P.E.P. 
§ 409.03(a), Petitioner has filed herewith the following documents: 

(1) An original Declaration for Patent Application executed by Henry P. 
Moreton, fulfilling the requirements of 37 C.F.R. § 1.47(a); 

(2) Statement of Facts in Support of Fihng On Behalf of Non-Signing 
Inventors Under 37 C.F.R. § 1.47(a) from LuAnne M. DeSantis, Esq., 
along with referenced Exhibits A-P. 

The Declaration for Patent Apphcation has been signed by one of the five named joint 
inventors. Timothy J. Van Hook, Peter Yan-Tek Hsu, William A. Hufftnan, and E£irl A. 
Kilhan have not executed the Declaration for Patent Apphcation. Petitioner submits that the 
Declaration for Patent Application signed by Henry P. Moreton, with the signature blocks of 
the non-signing inventors left blank, should be considered as having been signed by all of the 
joint inventors on behalf of the non-signing inventors. See M.P.E.P. § 409.03(a)(A). 
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Van Hook et al. 
Appl. No. 09/662,832 

The Statement of Facts in Support of Filing on Behalf of Non-Signing Inventors 
Under 37 C.F.R. § 1.47(a), as required by M.P.E.P. § 409.03(a)(B), from LuAnne M. 
DeSantis, Esq., provides proof of the pertinent facts that the non-signing inventors refuse to 
sign. The Statement of Facts also indicates the last known addresses of the non-signing 
inventors as required by M.P.E.P. § 409.03(a)(C). 

During the period April 1 5 through May 20, 2004, multiple telephone calls were made 
to the Office of Petitions in attempts to get guidance on what action should be taken to 
correct any deficiencies in the originally filed Declaration in the '832 application and in a 
similarly situated, commonly owned application. We initially sought to obtain only the 
Declaration signature of Mr. Timothy J, Van Hook, since he was the only non-signing 
inventor on the original Declaration filed in the grandparent application (U.S. Pat. Appl. No. 
08/947,649, now U.S. Pat. No. 5,933,650). However, during a telephone call on June 15, 
2004, Ms. Alesia M. Brown of the Office of Petitions stated that because the original 
Declaration filed in the grandparent application was executed improperly and because there is 
no indication that Rule 47 status was ever granted in the grandparent application, she 
recommended that a new Declaration be executed by all inventors listed on the *832 
application. Our first attempts to get only Mr. Van Hook's signature and our subsequent 
attempts to get signatures from all inventors are detailed in the accompanying Statement of 
Facts. 

Petitioner therefore respectfully submits that the documents filed herewith satisfy all 
the requirements of 37 C.F.R. § 1.47(a) and M.P.E.P. §§ 409.03(a), (d), and (e). 
Accordingly, Petitioner respectfully requests that the present application be accorded status 
under 37 C.F.R. § 1.47(a). 
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Van Hook et al. 



Appl. No. 09/662,832 



Payment of the fee for filing a petition under 37 C.F.R. § 1.47 set forth in 37 C.F.R. § 
1.17(g) is provided in the accompanying Credit Card Payment Form (PTO-2038). 

A Petition Under 37 C.F.R. § 1.136(a)(1) for a five-month extension of time 
accompanies this request. It is believed that no additional extension of time is necessary. 
However, if an additional extension of time is required to prevent abandonment of the present 
application, then such extension is hereby petitioned. 

The U.S. Patent and Trademark Office is hereby authorized to charge any fee 
deficiency, or credit any overpayment, to our Deposit Account No. 19-0036. 



Respectfully submitted. 




Sterne, Kessler, Goldstein & Fox p.l.l.c. 




Deriald J. Featherstone 
Attomey for Applicants 
Registration No. 33,876 



1 100 New York Avenue, N.W. 
Washington, D.C. 20005-3934 
(202)371-2600 
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DeclarSffoh for Patent Application 

RECEIVED 

DEC ^ ^ 200^°^*^^^ 1778.0100002 

As a below named inventor, 1 hereby declare that: v^r-^TinKlC 

OFFICE OF PETITIONS 

My residence, mailing address and citizenship are as stated below next to my name. 

I believe 1 am an original, first and joint inventor of the subject matter that is claimed and for which a patent is sought on the 
invention entitled Alignment and Ordering of Vector Elements for Single Instruction Multiple Data Processing, the 

specification of which is attached hereto unless the following box is checked: 

13 was filed on September 15, 2000; 

as United States Application Number 09/662,832; and 

was amended on September 15, 2000; April 9, 2001 ; January 24, 2002; May 22, 2002; and June 1 8, 2004. 

I hereby state that I have reviewed and understand the contents of the above identified specification, including the claims, as 
amended by any amendment referred to above. 

I acknowledge the duty to disclose information that is material to patentability as defined in 37 C.F.R. § 1.56, including for 
continuation-in-part applications, material information which became available between the filing date of the prior 
application and the national or PCX international filing date of the continuation-in-part application. 

I hereby claim foreign priority benefits under 35 U.S.C. § 119(a)-(d) or (f) or § 365(b) of any foreign application(s) for 
patent, inventor's or plant breeder's rights certificate(s), or § 365(a) of any PCX international application, which designated 
at least one country other than the United States of America, listed below, and have also identified below, by checking the 
box, any foreign application for patent, inventor's or plant breeder's rights certificate(s), or PCX international application 
having a filing date before that of the application on which priority is claimed. 

Prior Foreign Applications(s): Priority Claimed 
nVes DNo 



(Application No.) (Country) (Day/MonthAfear Filed) 



□ Yes □ No 



(Application No.) (Country) (Day/MonthA^ear Filed) 



Send Correspondence to: Sterne, ICessler, Goldstein & Fox P.L.L.C. 

1 100 New York Avenue, N.W. 
Washington, D.C. 20005-3934 

Direct Xelephone Calls to: (202) 371 -2600 
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AppI No. 09/662,832 
Docket No. 1778.0100002 



I hereby declare that all statements made herein of my own knowledge are true and that all statements made on information 
and belief are believed to be true; and further that these statements were made with the knowledge that willful false 
statements and the like so made are punishable by fine or imprisonment, or both, under 18 U.S.C. § 1001 and that such 
willful false statements may jeopardize the validity of the application or any patent issued thereon. 



Full name of first Inventor: 


Timothy J. Van Hook 




Signature of first Inventor: 




Date: 


Residence: 


Atherton, OA 




Citizenship: 


U.S.A. 




Mailing Address: 


224 Oakgrove Avenue 
Atherton, OA, 94027 






Full name of second Inventor: 


Peter Yan-Tek Hsu 




Signature of second Inventor: 




Date: 


Residence: 


San Francisco, CA 




Citizenship: 


U.S.A. 




Mailing Address: 


1 Rausch Street, Unit F 
San Francisco, CA, 94103 






Full name of third Inventor: 


WiUiam A. Huffman 




Signature of third Inventor: 




Date: 


Residence: 


Los Gatos, CA 




Citizenship: 


U.S.A. 




Mailing Address: 


16205 Roseleaf Lane 

Los Gatos, CA, 95032-3610 
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Full name of fourth Inventor: 


Henry P. Moraton 




Signature of fourth Inventor: 


ytM 


Date: 


Residence: 


Woodside, CA 




Citizenship: 


U.S.A. 




Mailing Address: 


140 Phillip Road 
Woodside, CA, 94062-2625 






Full name of fifth Inventor: 


Earl A. Killian 




Signature of fifth Inventor: 




Date: 


Residence: 


Los Altos Hills, CA 




Citizenship: 


U.S.A. 




Mailing Address: 


27691 Central Drive 

Los Altos Hills, CA, 94022 
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Q Sterne Kesslef 
Goldstein Fox 

ATTORNEYS AT LAW 



Robert Greene Sterne 
Edvyrard i. Kessler 
Jorge A. Goldstein 
OavkJ K.S. Comwell 
Robert W.EsmorMi 
Tracy-Gene G. Durtcin 
MidNeleA. Cimbala 
Michael B. Ray 
Robert E. Solcohl 
Eric K. Steffe 
Michael Q. Lee 
Steven R. Ludwig 
John M. Covert 
Linda E. Alcorn 
Robert C Millonig 
Donald J. featherstone 
Lawrence B. Bugaislcy 
Michael V. Messtnger 



Judith U.Kim 
Timothy J. Shea, Jr. 
Patrick E. Garrett 
Heidi L Kraus 
Edward W.Yee 
Albert L Ferro* 
Donald R. Banowit 
Peter A. Jackman 
Teresa U, Medler 
Jeffreys. Weaver 
Kendrick P. Patterson 
Vincent L. Capuano 
Eldora Ellison Floyd 
Thomas C. Fiala 
Brian J. Del Buono 
Virgil Lee Beastori 
)ambertyN.Reddick 
Theodore A. Wood 



Elizabeth J. Haanes 
Joseph S. Ostroff 
Frank R. Cottingham 
Christine M. Lhulier 
Rae Lynn Prengaman 
Jane ihershenovich* 
George S. Bardmesser 
Daniel A. Klein* 
Jason D. Eisertberg 
Michael D. Specht 
Andrea J. Kamage 
Tracy L Muller* 
LuAnne M. DeSantis 
John J. Rgueroa 
Ann E. Summerfield 
Tiera S.Caston 
AricW.Ledford' 
Jessica L Parezo 



Timothy A. Doyle* 
Gaby L Longsworth* 
Nicole 0. Oretar* 
Ted J. Ebersole 
JyotI C Iyer' 

ReqistgfcdPatfflt Agents* 

Karen R. Markowia 
Nancy J. Leltti 
Helene C Carlson 
Matthew J. Dowd 
Aaron L Schwartz 
KatrinaY.PeiQuach 
Bryan L Skelton 
Robert A. Sdiwartzman 
Teresa A. Colella 
Jeffrey S. Lundgren 
Victona S. Rutfierford 



Eric D, Hayes 
Michelle K. Holoubek 
Robert H. DeSdrro 
Simon J. Elfiott 
JulieA.Helder 
Mita Mukherjee 
Scott M.Woodhouse 



Kenneth C. Bass III 
Evan R. Smith 
Marvin C Guthrie 

•Admitted only in Maryland 
* Admitted on^ in Virginia 
•Practice Limited to 
Federal Agwdes 



June 3, 2004 

Writer*s Direct Nu\fBER: . 

(202) 772-8629 

Internet Address: 

D0NF@SKGF.COM 

Timothy J. van Hook Via Federal Express 

224 Oakgrove Avenue 
Atherton, CA 94027 



Re: Declaration for U.S. Utility Patent Application 

Application No. 09/662,832; Filed: September 15, 2000 
For: Alignment and Ordering of Vector Elements for Single Instruction 
Multiple Data Processing 

Our Ref: 1778.0100002/DJF/LMY 



Dear Mr. van Hook: 

Further to your conversation with LuAnne DeSantis on June 2, 2004, enclosed please 
find a copy of the above-captioned patent application as filed in the United States Patent and 
Trademark Office (USPTO) on September 15, 2000, along with copies of four subsequently filed 
amendments for your review. 

Also enclosed for your review and execution please find a "Supplemental Declaration for 
Patent Application." Once you have completed your review of the specification, drawings, and 
amendments, and if the information listed on the Declaration is correct and complete, we ask that 
you sign and date the Declaration in the space provided and return it to us using the enclosed 
self-addressed stamped envelope. 

If, for any reason, you do not wish to execute the enclosed Declaration, we ask that you 
please review and execute the statement at the close of this letter. This statement simply verifies 
your receipt of this letter, states that you elected not to sign the enclosed Supplemental 
Declaration for the above-captioned currently pending patent application and states that you 
previously elected not to sign the Declaration for parent application 09/847,649. If the statement 
meets widi your approval, please sign and date in the space provided below. Then, please return 
a copy of this letter to us in the enclosed self-addressed stamped envelope at your earliest 
possible convenience. 



Steme, Kessler, Goldstein & Fox PLLC : 1100 New York Avenue, NW : Washington, DC 20005 : 202.371.2600 f 202.371.2540 : www.skgf.com 



Timothy J. van Hook 
June 3, 2004 
Page 2 

We greatly appreciate your assistance in this matter. If you have any questions, please do 
not hesitate to contact us. 



I, Timothy J. van Hook, hereby state that I elect not to execute the Supplemental Declaration for 
Patent Application for currently pending U.S. Patent Application No. 09/662,832, filed 
September 15, 2000 (which is a continuation of U.S. AppUcation No. 09/263,798, filed March 5, 
1999, and U.S. AppHcation No. 08/947,649, filed October 9, 1997) for the invention entitled 
Alignment and Ordering of Vector Elements for Single Instruction Multiple Data 
Processing, the specification and subsequent amendments to which have been provided to me 
for my review by Sterne, Kessler, Goldstein & Fox, P.L.L.C. 

Furthermore, I hereby confirm my past election not to execute the Declaration for parent U.S. 
Application No. 08/947,649 in October 1997 and March 1998, the specification of which was 
provided to me by the Law Offices of Wagner, Murabito & Hao. 

Signed: Date: 

Timothy J. van Hook 



Very truly yours, 



Sterne, Kessler, Goldstein & Fox p.l.l.c. 




Donald J. Featherstone 
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From: Origin ID: (202)371-2600 
Donald J. Featherstone 
steme kessler goldstein & fox 
1100 New York Avenue. NW 

Washington. DC 20005 



FedEx 

: Express 




SHIP TO: (650)325-7605 

Timothy J. van Hook 
224 Oakgrdve Avenue 
Atherton, CA 94027 



BILL SENDER 




Ship Date: 03JUN04 
AclualWgt: 1LB 
Sysiem#: 1162221/iNET1800 
Account*: S 



REF: 1778.0100002 





Delivery Address Bar Code 



PRIORITY OVERNIGHT 
TRK# 7918 5741 6715 



94027 -GA-us 



SFO 



DieirverBy: 
04JUN04' 

A2 




Shipping Label: Your shipment is complete 

1 . Use the 'Print' feature from your browser to send this page to your laser or Inkjet printer. 

2. Fold the printed page along the horizontal line. 

3. Place label in shipping pouch and affix it to your shipment so that the barcode portion of the label can be read and scanned. 
Warning: Use only the printed original label for shipping. Using a photocopy of this label for shipping purposes is fraudulent 
and could result in additional billing charges, along with the cancellation of your FedEx account number. 

Use of this system constitutes your agreement to the sen^ice conditions in the current FedEx Sen/ice Guide, available on fedex.com, FedEx will not 
be responsible for any claim in excess of $1 00 per package, whether the result of loss, damage, delay, non-delivery, misdelivery, or misinfonnation, 
unless you declare a higher value, pay an additional charge, document your actual loss and file a timely claim. Limitations found in the current FedEx 
Service Guide apply. Your right to recover from FedEx for any loss, including intrinsic value of the package, loss of sales, income interest, profit, 
attorney*s fees, costs, and other forms of damage whether direct, incidental, consequential, or special is limited to the greater of $100 or the 
authorized declared value. Recovery cannot exceed actual documented loss. Maximum for items of extraordinary value is $500, e.g. jewelry, 
precious metals, negotiable instalments and other items listed in our Service Guide. Written claims must be filed within strict time limits, see current 
FedEx Service Guide. 
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Supplemental Declaration for Patent Application 



Docket Number: 1778.0100002 



As a below named inventor, I hereby declare that: 

My residence, mailing address and citizenship are as stated below next to my name. 

I believe I am an original, first and joint inventor of the subject matter that is claimed and for which a patent is sought on the 
invention entitled Alignment and Ordering of Vector Elements for Single Instruction Multiple Data Processing, the 
specification of which is attached hereto unless the following box is checked: 

^ was filed on September 15, 2000, as United States Application Number 09/662,832 (which is a continuation of 
U.S. Application Number 09/263,798, filed March 5, 1999, which is a continuation of U.S. Application Number 
08/947,649, filed October 9, 1997); and 

was amended on September 15, 2000; April 9, 2001; January 24, 2002; and May 22, 2002. 

I hereby state that I have reviewed and understand the contents of the above identified specification, including the claims, as 
amended by any amendment referred to above. 

I acknowledge the duty to disclose information that is material to patentability as defined in 37 C.F.R. § 1.56, including for 
continuation-in-part applications, material information which became available between the filing date of the prior 
application and the national or PCT international filing date of the continuation-in-part application. 

I hereby claim foreign priority benefits under 35 U.S.C. § 1 19(a)-(d) or (f) or § 365(b) of any foreign application(s) for 
patent, inventor's or plant breeder's rights certificate(s), or § 365(a) of any PCT international application, which designated 
at least one country other than the United States of America, listed below, and have also identified below, by checking the 
box, any foreign application for patent, inventor's or plant breeder's rights certificate(s), or PCT international application 
having a filing date before that of the application on which priority is claimed. 

Prior Foreign Applications(s): Priority Claimed 



□ Yes 



□ No 



(Application No.) 



(Country) 



(Day/Month/Year Filed) 



□ Yes 



□ No 



(Application No.) 



(Country) 



(Day/Month/Year Filed) 



Send Correspondence to: 



Sterne, Kessler, Goldstein & Fox P.L.L.C. 
1 100 New York Avenue, N.W. 
Washington, D.C. 20005-3934 



Direct Telephone Calls to: 



(202) 371-2600 
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Appl No. 09/662,832 
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I hereby declare that all statements made herein of my own knowledge are true and that all statements made on information 
and belief are believed to be true; and further that these statements were made with the knowledge that willful false 
statements and the like so made are punishable by fine or imprisonment, or both, under 18 U.S.C. § 1001 and that such 
willful false statements may jeopardize the validity of the application or any patent issued thereon. 



Full name of first Inventor: 


Timothy J. van Hook 




Signature of first Inventor: 




Date: 


Residence: 


Atherton, CA 




Citizenship: 


U.S.A. 




Mailing Address: 


224 Oakgrove Avenue 
Atherton, CA 94027 
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Declaration for Patent Application 



Docket Number: 1 778.01 00002 



As a below named inventor, I hereby declare that: 

My residence, mailing address and citizenship are as stated below next to my name. 

I believe I am an original, first and joint inventor of the subject matter that is claimed and for which a patent is sought on the 
invention entitled Alignment and Ordering of Vector Elements for Single Instruction Multiple Data Processing, the 

specification of which is attached hereto unless the following box is checked: 

S was filed on September 15, 2000; 

as United States Application Number 09/662,832; and 

was amended on September 15, 2000; April 9, 2001; January 24, 2002; May 22, 2002; and June 18, 2004. 

I hereby state that I have reviewed and understand the contents of the above identified specification, including the claims, as 
amended by any amendment referred to above. 

I acknowledge the duty to disclose information that is material to patentability as defined in 37 C.F.R/§ 1.56, including for 
continuation-in-part applications, material information which became available between the filing date of the prior 
application and the national or PCT international filing date of the continuation-in-part application. 

I hereby claim foreign priority benefits under 35 U.S.C. § 119(a)-(d) or (f) or § 365(b) of any foreign application(s) for 
patent, inventor's or plant breeder's rights certificate(s), or § 365(a) of any PCT intemational application, which designated 
at least one country other than the United States of America, listed below, and have also identified below, by checking the 
box, any foreign application for patent, inventor's or plant breeder's rights certif!cate(s), or PCT international application 
having a filing date before that of the application on which priority is claimed. 

Prior Foreign Applications(s): Priority Claimed 



□ Yes 



□ No 



(Application No.) 



(Country) 



(Day/MonthA'ear Filed) 



□ Yes 



□ No 



(Application No.) 



(Country) 



(Day/Month/Year Filed) 



Send Correspondence to: 



Sterne, Kessler, Goldstein & Fox P.L.L.C. 
1 1 00 New York Avenue, N. W. 
Washington, D.C. 20005-3934 



Direct Telephone Calls to: 



(202) 371-2600 
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Appl No. 09/662,832 
Docket No. 1778.0100002 



I hereby declare that all statements made herein of my own knowledge are true and that all statements made on information 
and belief are believed to be true; and further that these statements were made with the knowledge that willful false 
statements and the like so made are punishable by fine or imprisonment, or both, under 18 U.S.C. § 1001 and that such 
willful false statements may jeopardize the validity of the application or any patent issued thereon. 



Full name of first Inventor: 


Timothy J. Van Hook 




Signature of first Inventor: 




Date: 


Residence: 


Atherton, OA 




Citizenship: 


U.S.A. 




Mailing Address: 


224 Oakgrove Avenue 
Atherton, OA, 94027 






Full name of second Inventor: 


Peter Yan-Tek Hsu 




Signature of second Inventor: 




Date: 


Residence: 


San Francisco, OA 




Citizenship: 


U.S.A. 




Mailing Address: 


1 Rausch Street, Unit F 
San Francisco, CA, 94103 






Full name of third Inventor: 


William A. Huffman 




Signature of third Inventor: 




Date: 


Residence: 


Los Gatos, CA 




Citizenship: 


U,S.A. 




Mailing Address: 


16205 Roseleaf Lane 

Los Gatos, CA, 95032-3610 
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Declaration for Patent Application 



Docket Number: 1 778.0 1 0000 1 



As a below named inventor, I hereby declare that: 

My residence, mailing address and citizenship are as stated below next to my name. 

I believe I am an original, first and joint inventor of the subject matter that is claimed and for which a patent is sought on the 
invention entitled Alignment and Ordering of Vector Elements for Single Instruction Multiple Data Processing, the 

specification of which is attached hereto unless the following box is checked: 

^ was filed on March 5, 1999; 

as United States Application Number 09/263,798 (now U.S. Patent No. 6,266,752 Bl; Issued: July 24, 2001); and 
was amended on September 1, 1999; May 1, 2000; and May 19, 2000. 

I hereby state that I have reviewed and understand the contents of the above identified specification, including the claims, as 
amended by any amendment referred to above. 

I acknowledge the duty to disclose information that is material to patentability as defined in 37 C.F.R. § 1.56, including for 
continuation-in-part applications, material information which became available between the filing date of the prior 
application and the national or PCX international filing date of the continuation-in-part application. 

I hereby claim foreign priority benefits under 35 U.S.C. § 119(a)-(d) or (f) or § 365(b) of any foreign application(s) for 
patent, inventor's or plant breeder's rights certificate(s), or § 365(a) of any PCX international application, which designated 
at least one country other than the United States of America, listed below, and have also identified below, by checking the 
box, any foreign application for patent, inventor's or plant breeder's rights certificate(s), or PCT international application 
having a filing date before that of the application on which priority is claimed. 

Prior Foreign Applications(s): Priority Claimed 



□ Yes 



□ No 



(Application No.) 



(Country) 



(Day/MonthA'ear Filed) 



□ Yes 



□ No 



(Application No.) 



(Country) 



. (Day/MonthA'ear Filed) 



Send Correspondence to: 



Sterne, Kessler, Goldstein & Fox P.L.L.C. 
1 100 New York Avenue, N.W. 
Washington, D.C. 20005-3934 



Direct Telephone Calls to: 



(202) 371-2600 



- Page 1 of 3 - 



AppI No. 09/263,798 
Docket No. 1778.0100001 



I hereby declare that all statements made herein of my own knowledge are true and that all statements made on information 
and belief are believed to be true; and further that these statements were made with the knowledge that willful false 
statements and the like so made are punishable by fine or imprisonment, or both, under 18 U.S.C. § 1001 and that such 
willful false statements may jeopardize the validity of the application or any patent issued thereon. 



Full name of first Inventor: 


Timothy J. Van Hook 




Signature of first Inventor: 




Date: 


Residence: 


Atherton, CA 




Citizenship: 


U.S.A. 




Mailing Address: 


224 Oakgrove Road 
Atherton, CA, 94027 






Full name of second Inventor: 


Peter Yan-Tek Hsu 




Signature of second Inventor: 




Date: 


Residence: 


San Francisco, CA 




Citizenship: 


U.S.A. 




Mailing Address: 


1 Rausch Street, Unit F 
San Francisco, CA, 94103 






Full name of third Inventor: 


William A. Huffman 




Signature of third Inventor: 




Date: 


Residence: 


Los Gatos, CA 




Citizenship: 


U.S.A. 




Mailing Address: 


16205 Roseleaf Lane 

Los Gatos, CA, 95032-3610 





-Page 2 of 3- 



Appl No. 09/263,798 
Docket No. 1778.0100001 



Full name of fourth Inventor: 


Henry P. Moreton 




Signature of fourth Inventor: 




Date: 


Residence: 


Woodside, CA 




Citizenship: 


U.S.A. 




Mailing Address: 


140 Phillip Road 
Woodside, CA, 94062-2625 






Full name of fifth Inventor: 


Earl A. Killian 




Signature of fifth Inventor: 




Date: 


Residence: 


Los Altos Hills, CA 




Citizenship: 


U.S.A. 




Mailing Address: 


27961 Central Drive 

Los Altos Hills, CA, 94022 
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At^ Cket No.:SGI 15-4-457.00 



Declaration and Power of Attorney 
for a Patent Application 

Declaration 

As below named inventor, I hereby declare that my residence post office address, and citizenship are as stated 
below . m y name. Furt her. I hereby declare that I believe I am the originalJirsLand sole inventor (if only one name ^ 
listed below) or an original, first and joint inventor (if plural names are listed below) of the subject matter which is 
claimed and for which a patent is sought on the invention entitled: 

ALIGNMENT AND ORDERING OF VECTOR ELEMENTS FOR SINGLE INSTRUCTION MULTIPLE DATA 
PROCESSING 

the specification of which: 
^ is attached hereto, or 

X was filed on _lo/.,?./.?Z application serial no. 08 / 9 47 ,.649 : and 

was amended on ZZI « " ■ 

I hereby state that I have reviewed and understand the contents of the above identified specification, including 
the claims, as amended by any amendment referred to above; and 

I acknowledge the duty to disclose information which is material to the examination of this application in 
accordance with Title 37. Code of Federal Regulations. Section 1.56(a). 



Foreign Priority Claim 

I hereby claim foreign priority benefits under Title 35, United States Code Section 1 19 of any foreign application(s) 
for patent or inventor's certificate listed below and have also identified, below any foreign application for patent or 
inventor's certificate having a filing date before that of the application on which priority is claimed: 

Number Country Date Filed Priority Claimed 

„ yes no 

„ yes no 



U.S. Priority Claim 

I hereby claim the benefit under Title 35. United States Code, Section 120 of any United States application(s) 
listed below and, insofar as the subject matter of each of the claims of this application is not disclosed in the prior 
United States application in the manner provided by the first paragraph of Title 35. United States Code, Section 
112, 1 acknowledge the duty to disclose material information as defined in Title 37, Code of Federal Regulations, 
Section 1.56(a) which occurred between the filing date of the prior application and the national or PCT 
international filing date of this application: 

Serial Number Filing Date Status (patented/pending/abandoned) 
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Best Available Copy 

rev. t/98 dbp 



Attorf > et No.:SGI 15-4-457.00 



Power of Attorney 

As a named inventor, I hereby appoint the following attorney(s) and/or agent(s) to prosecute this application and 
transact all business in the Patent Trademark Office connected therewith. 

Registration No.: ...36..'..?.?..?. 

"Tj^n^ Registration no.: .J5 ^29.b ^ 

J9^,?i. Registration No.: ...35 J 9 8 

Glerm D .^^^^^^^ Registration No.: ,Jj±hJl}. 

- Registration No.: .„P-il;.,923 

^^SZ±.!l^.9lS.fJ. Registration No.: J8 , 3 30 

Chris B^rne Registration No.: ...i^, 204 

J:]^S^±IM1},^B^31 .« Registration No.: . J4 ...625 

.«J£^.!L J.Li9.4.?.?. Registration No.: . J.0 53 0 

Send Correspondence to: 

WAGNER, MURABITO & HAO 

Two North Market Street. Third Floor 
San Jose, California 95113 
(408) 938-9060 



Signatures 

I hereby declare that all statements made herein of my own knowledge are true and that all statements made on . 
Information and belief are believed to be true; and further that these statements were made with the knowledge 
that willful false statements and the like so made are punishable by fine or imprisonment, or both, under Section 
1001 of Title 18 of the United States Code and that such willful false statements may jeopardize the validity of the 
application or any patent issued thereon. 

Full Name of Sole/First Inventor Timothy J» van Hook 



Inventor's Signature cl^<^il----u-L.^*^o--^^ ^ Date 

Residence Athertonrcal^^^^^^ „ Citizenship 

P.O. Address .iM Oak^roye 

Full Name of Second/Joint Inventor: Peter Hsu 



Inventor's Signature Date 

Residence X£e?.9.!}.,^..:....'?.^ii^^^^ Citizenship 

P.O. Address ...2853^,Welk Co^^ ?.?.?,n)9.^}.!^..^....?.a.^;,i.l9.?.^?..^.a 

Full Name of Third/Joint Inventor: William a. Huffman 



Inventor's Signature Date 

Residence ....Los„ Ga^^^^^^^^ Citizenship ysA 

P.O. Address 16205 Roseleaf Lane, Los Gatos, California 95032 
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rov. 1/98 dbp 

Best Available Copy 



Attor^' > ' et No.: SGI 15-4-457.00 



Full Name of Fourth/Joint Inventor: Henrjr p .._Morejt 



inventor's Signature Date 

Residence WoodsideT Carff Citizenship USA 

PTOrAddress 3r40-fi hilli p Road, We ods i d e Calif or^^^ M5.^ 1z Zf tZ l. 



Full Name of Fifth/Joint Inventor: Earl a, ^ Killi^^^^^^ 



Inventor's Signature Date 

Residence Los Al toT^^^ Citizenship _USA 

'""JCiiy State) 
P.O. Address 27961 Central Drive, Los Altos_Hm 
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Best Available Copy 



^ Attorj^ -0 et No.:SGI 15-4-457.00 

Declaration and Power of Attorney 
for a Patent Application 

Declaration 

As below named inventor, I hereby declare that my residence post office address, and citizenship are as stated 

-betowTny-namer-Furt he r, hheretjy-de c l a r e that-HjeHeve+am-the-eftgmatrfirs t a nd s ole in v en t or ( i f o n l y on e na me-is 

listed below) or an original, first and joint inventor (if plural names are listed below) of the subject matter which is 
claimed and for which a patent is sought on the invention entitled: 

ALIGNMENT AND ORDERING OF VECTOR ELEMENTS FOR SINGLE INSTRUCTION MULTIPLE DATA 
PROCESSING 

the specification of which: 

is attached hereto, or 

Z^ "l application serial no. ... 08 /.947_,_ 649 ^ : and 

was amended on 



I hereby state that I have reviewed and understand the contents of the above identified specification, including 
the claims, as amended by any amendment refen-ed to above; and 

I acknowledge the duty to disclose information which is material to the examination of this application in 
accordance with Title 37, Code of Federal Regulations, Section 1.56(a). 



Foreign Priority Claim 

I hereby claim foreign priority benefits under Title 35, United States Code Section 1 19 of any foreign application(s) 
for patent or inventor^s certificate listed below and have also identified below any foreign application for patent or 
inventor's certificate having a filing datie before that of the application on which priority is claimed: 

Number Country Date Filed Priority Claimed 

yes no 

yes no 



U.S. Priority Claim 

I hereby claim the benefit under Title 35. United States Code. Section 120 of any United States application{s) 
listed below and. insofar as the subject matter of each of the claims of this application is not disclosed in the prior 
United States application in the manner provided by the first paragraph of Title 35, United States Code, Section 
112, 1 acknowledge the duty to disclose material information as defined in Title 37. Code of Federal Regulations. 
Section 1.56(a) which occurred between the filing date of the prior application and the national or PCT 
international filing date of this application: 

Serial Number Filing Date Status (patented/pending/abandoned) 
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Best Available Copy 

rev. 1/98 cop 



Atto( 0 et No.rSGI 1 5-4-457.00 



Power of Attorney 

As a named inventor, 1 hereby appoint the following attorney(s) and/or agent(s) to prosecute this application and 
transact all business in the Patent Trademark Office connected therewith. 



James P . Hao 



Registration No.: .,.3 6 , 398 



An t ho ny; 1^^^^ 

*^?^?.5„..?./.»..^?S.9J?™f. 

..Si.?5S...P..:»„.?S.?B.?.?. 

_ Wilfred 

S t e ve , We iner 

Chris B^rn^^ 

Irene Fernandez 



Jota Brig 

Send Correspondence to: 



Registration No.: „.35 / 29 5 

Registration No.: .„35 . 39 8 
Registration No.: ...Pji,? / 29 3. 
Registration No.: „ P-.41 923.. 

Registration No,: .J.8 ; 330 

Registration No.: , J2., 204 

Registration No.: .J4 ; 625 

Registration No.: _40.; 530 



WAGNER, MURABITO & HAO 

Two North Market Street. Third Floor 
San Jose, California 95113 
(408) 938-9060 



Signatures 

I hereby declare that all statements made herein of my own knowledge are true and that all statements made on . 
Information and belief are believed to be taie; and further that these statements were made with the knowledge 
that willful false statements and the like so made are punishable by fine or imprisonment, or both, under Section 
1001 of Title 18 of the United States Code and that such willful false statements may jeopardize the validity of the 
application or any patent issued thereon. 

Full Name of Sole/First Inventor: Timothy J. van Hook 



Inventor's Signature ^ Date 

Residence At:herton, California Citizenship USA 

"""(City Siaiej 
P.O. Address ,..2 2 4 . Oakgr o ve ^ A^^ .?4027 



Full Name of Second/Joint Inventor: Porter, Hsu 



Inventor's Signature J^^tf:^^^ ^^^^ 
Residence Fremont, Balifornia Citizenship 

P.O. Address ...2 853 Welk^^C^^^ .?..4555 

Full Name of Third/Joint Inventor: William a, Huffman 




Inventor's Signature Date 

Residence ,..Lo.s..Gat:os,^ Citizenship usa 

P.O. Address ....?:.6205^ Rosel^^^^ 5.50.32 

Best Available Copy 

Page 2 of 3 rev. m^<sv> 



Attorrj .:!. *'9t No.:SGI 15-4-457.00 



Full Name of Fourth/Joint Inventor: ..Hanry; p , Moraton 

Inventor's Signature Date 

Residence woods ide^^^ Citizenship usa 

P.O. A ddress 3:40— fihir3rH:p-Raad;73j^e^ ^ ?A9..^2 .7, 1§33 ,. 



Full Name of Fifth/Joint Inventor: ...Eari a. Killia^ 

Inventor's Signature D^^® 

Residence Los Al toT HiH^ Citizenship jjsa 

'(City Siaiej 
P.O. Address 27961 Central Drive, Los Altos_Hn^^^ 
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Best Available Copy 

rev. t/98al>p 



( 



Attor(^ . ^ket No.:SGI 15-4-457.00 



Declaration and Power of Attorney 
for a Patent Application 

Declaration 

As below named inventor, I hereby declare that my residence post office address, and citizenship are as stated 

listed below) or an original, first and joint inventor (if plural names are listed below) of the subject matter which is 
claimed and for which a patent is sought on the invention entitled: 

ALIGNMENT AND ORDERING OF VECTOR ELEMENTS FOR SINGLE INSTRUCTION MULTIPLE DATA 
PROCESSING 

the specification of which: 

is attached hereto, or 

IKl ^"^^ jL_0/9/g7 as application serial no, 0.8 .§4 9. ' 

was amended on 



I hereby state that I have reviewed and understand the contents of the above identified specification, including 
the claims, as amended by any amendment referred to above; and 

I acknowledge the duty to disclose information which is material to the examination of this application in 
accordance with Title 37, Code of Federal Regulations, Section 1.56(a). 



Foreign Priority Claim ^ 

I hereby claim foreign priority benefits under Title 35, United States Code Seption 119 of any foreign application(s) 
for patent or inventor's certificate listed below and have also identified below any foreign application for patent or 
inventor's certificate having a filing date before that of the application on which priority is claimed: 

Number Country Date Filed Priority Claimed 

yes no 
yes no 



U.S. Priority Claim 

I hereby claim the benefit under Title 35, United States Code, Section 120 of any United States application(s) 
listed below and, insofar as the subject matter of each of the claims of this application is not disclosed in the prior 
United States application in the manner provided by the first paragraph of Title 35, United States Code, Section 
1 12, 1 acknowledge the duty to disclose material information as defined in Title 37, Code of Federal Regulations, 
Section 1.56(a) which occurred between the filing date of the prior application and the national or PCT 
international filing date of this application: 

Serial Number Filing Date Status (patented/pending/abandoned) 
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Best Available Copy 

fOv. 1/98 oOq 



Attor^' " 'ket No.:SGI 15-4-457.00 



Power of Attorney 

As a named inventor, I hereby appoint the following attorney(s) and/or agent(s) to prosecute this application and 
transact all business in the Patent Trademark Office connected therewith. 



James P . Hao 


Registration No.: 


36, 398 




Anthony C. Murabito 


Megistraiion ino.: 


3 i , ^ y i 


John P. Wagner 


Registration No,: 


35,398 




Registration No.: 


P-42,293 


Wilfred H. Lam 


Registration No.: 


JP - 41, 9 ?..?. 




Registration No.: 


,,,38,330 


Chris Byrne 


Registration No.: 


32,204 


..M-!;SSS.?».J.SS?!?}.^.^.^.?,?. ; 


Registration No.: 





.,.^2i?S«..§.£iS[.4.e5, 


Registration No.: 


Jtp,530 ^ 



Send Correspondence to: 

WAGNER, MURABITO & HAO 
Two North Market Street, Third Floor 
San Jose, California 95113 
(408) 938-9060 



Signatures 

I hereby declare that ail statements made herein of my own knowledge are tme and that all statements made on 
information and belief are believed to be true; and further that these statements were made with the knowledge 
that willful false statements and the like so made are punishable by fine or imprisonment, or both, under Section 
1001 of Title 18 of the United States Code and that such willful false statements may jeopardize the validity of the 
application or any patent issued thereon. 

Full Name of Sole/First Inventor: ...Timoth^^^ j . yan Hook 

Inventor's Signature Date 

Residence .A^herton,^ Citizenship usA 

"""(City " * Siaie) 
P.O. Address ...221..0akgrove_A^ „ „ 



Full Name of Second/Joint Inventor: Perter hbu 



Inventor's Signature Date 

Residence Fremont ,^^^c^^^ Citizenship 

"(City siaie) ' 

P.O. Address _2853 Welk Common, Fremont, California 94555 



Full Name of Third/Joint Inventor:, wiiiia^^^ A , Huffman 

Inventor's Signature^^^^^J^^r.^^^ D^^e d::J.,.^^„,^, ..,£f3.ji 

Residence ..Los^Gat^^^^ ]^*f^'iti*zen^ ^ ' 

"(City "siaie) 
P.O. Address 16205 Roseleaf Lane, Los Gatos, California 95032 
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Best Available Copy 

rev. 1/96 dbp 



{ 



Attof - r ket No.:SGI 15-4-457.00 



Full Name of Fourth/Joint Inventor: ...Henrx p . More ton 

Inventor's, Signature Date 

Residence ...Woods id^^^^^ Cirirenship usA 

"(City sTate) 
P.Q, Address J.40^^^P ^^^ Road. Wood s ide, California 94Qfi2-:)fi7.q 



Full Name of Fifth/Joint Inventor: Earl A, Killian 



Inventor's Signature ^ Date 

Residence Los Altos Sills, California ' CTfizensN^ USA 

P.O. Address .J.7.16L Central Drive, Los Altos Hills, California 94022 
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Best Available Copy 

fov. 1/98 (Jbp 



( AttO(( ket No.:SGI 15-4-457.00 

Declaration and Power of Attorney 
for a Patent Application 

Declaration 

As below named inventor, I hereby declare that my residence post office address, and citizenship are as stated 

Fufthe r, I h e r e b y-dec lar e that-t-betieve4xtfTHt^e-oFi§inaVfast^ n a me-Is 

listed below) or an original, first and joint inventor (if plural names are listed below) of the subject matter which is 
claimed and for which a patent is sought on the invention entitled: 

ALIGNMENT AND ORDERING OF VECTOR ELEMENTS FOR SINGLE INSTRUCTION MULTIPLE DATA 
PROCESSING 

the specification of which: 

is attached hereto, or 

ZKZ J:9.IlLil ^® application serial no. ...0 8 /.?i7.;. 649 : and 

was amended on _ 

I hereby state that I have reviewed and understand the contents of the above identified specification, including 
the claims, as amended by any amendment referred to above; and 

I acknowledge the duty to disclose infomiation which is ririaterial to the examination of this application in 
accordance with Title 37. Code of Federal Regulations, Section 1.56(a). 



Foreign Priority Claim 

I hereby claim foreign priority benefits under Title 35, United States Code Section 1 19 of any foreign applicatlon(s) 
for patent or inventor's certificate listed below and have also identified below any foreign application for patent or 
inventor's certificate having a filing date before that of the application on which priority is claimed: , 

Number Country Date Filed Priority Claimed 

yes no 

- yes no 



U.S. Priority Claim 

I hereby claim the benefit under Title 35. United States Code, Section 120 of any United States application(s) 
listed below and, insofar as the subject matter of each of the claims of this application is not disclosed in the prior 
United States application in the manner provided by the first paragraph of Title 35. United States Code, Section 
112, 1 acknowledge the duty to disclose material information as defined in Title 37, Code of Federal Regulations. 
Section 1.56(a) which occurred between the filing date of the prior application and the national or PCT 
international filing date of this application: 

Serial Number Filing Date Status (patented/pending/abandoned) 
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Best Available Copy 

rev. 1/98 dbo 



Atto( ket No.:SGI 15-4-457.00 



Power of Attorney 

As a named inventor, I hereby appoint the following attorney(s) and/or agent(s) to prosecute this application and 
transact all business in the Patent Trademark Office connected therewith. 



James P. Hao 



Registration No.: ,„3 6 JH. 



Anthony^ 

,.^s]^....?..»...i?s.5.».s.?. 

G 1 e nn JD . 





..(rA?.?:.?....?.^?.^.?. «... 

..L?„?.5.?..,Zf..?.!J,^Jf^.».S.?. 

Jo hn Br i gden 



Send Correspondence to: 



Registration No.: ...3 5 ; 295 

Registration No.: ....3 5 , 398 

Registration No.: ...p.7.42 , 29^^^^^ 
Registration No.: ...P-il;.92L 

Registration No.: ,...38 j 330 

Registration No.: ....3.2., 204^_ _ 
Registration No.: ..1.4 , 62 5_^^^^_ 
Registration No.: ...10 , .530 



WAGNER, MURABITO & HAG 

Two North Market Street. Third Floor 
San Jose, California 95i13 
(408) 938-9060 



Signatures 

I hereby declare that all statements made herein of my own knowledge are tme and that all statements made on 
information and belief are believed to be true; and further that these statements were made with the knowledge 
that willful false statements and the like so made are punishable by fine or imprisonment, or both, under Section 
1001 of Title 18 of the United States Code and that such willful false statements may jeopardize the validity of the 
application or any patent issued thereon. 

Full Name of Sole/First Inventor: ...Timothy j. van .Hook 



Inventor's Signature , , Date 

Residence Atherton, California Citizenship JJSA^ 

"""(City Siaie) 
P.O. Address „.22£.,Oakgrove^ .5.4027 ^ 



Full Name of SeCond/Joint Inventor: Perter Hsu 



Inventor's Signature Date 

Residence Fremonc, California Citizenship 

"(City Staie) 
P.O. Address 2853 Welk Common, Fremont, California 94555 



Full Name of Third/Joint Inventor: will i a , Huf fman 

Inventor's Signature Date 

Residence Los Ga^^ ; Citizenship ...USA 

"(City State) 

P.O. Address .„1120L.Ro.seie^^^ ?50.?.? 

Best Available Copy 

Page 2 of 3 r©v, \m cdo 



Attor( - '<et No.:SGI 15-4-457.00 



Full Name of Fourth/Joint Inventor: ^anry m?..?.?.??.?.?. 

Inventor's Signature [AJ/ \ Date 

Residence woods id6,iQ:aliforni^ Citizenship USA 

"(City Siate) 

P.O. Address T^PTTRTn^ 

Full Name of Fifth/Joint Inventor: Earl a. Killian 



Inventor's Signature Date 

Residence Los Al't:os"¥nTs Citizenship jjsa 

"(City State) 

P.O. Address 27961 Central Drive, Los Alt:os Hills, California 94022 
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Best Available Copy 

rev. t/98 d5p 



( Attorri / 9t No,:SGI 15-4-457.00 

Declaration and Power of Attorney 
for a Patent Application 

Declaration 

As below named inventor, I hereby declare that my residence post office address , and citizenship are as stated 
below my name. Further. I hereby declare that I believe I am the onginai, nrst and sole inventor (if only one name Is 
listed below) or an original, first and joint inventor (if plural names are listed below) of the subject matter which is 
claimed and for which a patent is sought on the invention entitled: 

ALIGNMENT AND ORDERING OF VECTOR ELEMENTS FOR SINGLE INSTRUCTION MULTIPLE DATA 
PROCESSING 

the specification of which: 

is attached hereto, or 

X was filed on- 10/9/97 as application serial no. 08/947,649 :and 
^ was amended on _ 

I hereby state that I have reviewed and understand the contents of the above identified specification, including 
the claims, as amended by any amendment referred to above; and 

I acknowledge the duty to disclose information which is material to the examination of this application in 
accordance with Title 37, Code of Federal Regulations, Section 1 .56(a). 



Foreign Priority Claim 

I hereby claim foreign priority benefits under Title 35, United States Code Section 11 9 of any foreign application(s> 
for patent or inventor's certificate listed below and have also identified below any foreign application for patent or 
inventor's certificate having a filing date before that of the application on which priority is claimed: 

Number Country Date Filed Priority Claimed 

yes no 
yes no 



U.S. Priority Claim 

I hereby claim the benefit under Title 35. United States Code, Section 120 of any United States application{s) 
listed below and, insofar as the subject matter of each of the claims of this application is not disclosed in the prior 
United States application in the manner provided by the first paragraph of Title 35, United States Code, Section 
1 12, 1 acknowledge the duty to disclose material information as defined in Title 37. Code of Federal Regulations, 
Section 1 .56(a) which occurred between the filing date of the prior application and the national or PCT 
intemational filing date of this application: 

Serial Number Filing Date Status (patented/pending/abandoned) 



Best Available Copy 
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AttoH 



:et No.rSGI 15-4-457.00 



Power of Attorney 

As a named inventor. I hereby appoint the following attorney(s) and/or agent(s) to prosecute this application and 
transact all business in the Patent Trademark Office connected therewith. 



James P . Hao 



Registration No.: 



Anthon^^^ 

.,^2.^..,.?..;..«il(.^,5J?.?.?. 

^ Glenn D.^^^ B 

Wilfred H. Lam 



Steve Weiner 



Chris Byrne 
^JErene^ Fern^^^ 
J o hn^ B£ i 

Send Correspondence to: 



Registration No.: .„15 ; 295 

Registration No.: 35 , 398. 

Registration No.: .„p-42^ 293 

Registration No.: ...P-.41 ,^ 923 

Registration No.: „ 38., 33 0^^. 
Registration No.:; 32 , 2^^^^ 

Registration No.: _34.; 625 

Registration No.: „.40.r„530 



WAGNER, MURABITO & HAO 

Two North Market Street. Third Floor 
San Jose, Califomia 951 13 
(408) 938-9060 



Signatures 

I hereby declare that all statements made herein of my own knowledge are true and that all statements made on . 
information and belief are believed to be true; and further that these statements were made with the knowledge 
that willful false statements and the like so made are punishable by fine or imprisonment, or both, under Section 
1001 of Title 18 of the United States Code and that such willful false statements may jeopardize the validity of the 
application or any patent issued thereon. 



Full Name of Sole/First Inventor 



Timothy J. van Hook 



Inventor's Signature 

Residence Atherton, California 
P.O. Address 



Date 



^ Citizenship ^.usa 

224 Oakg^^^^^ 



Full Name of Second/Joint Inventor Potar Hsu 



Inventor's Signature Date 

Residence Fremont, California Citizenship 

"(City sTaie) ■ 
P.O. Address 2853 We Ik Common, Fremont, California 94555 



Full Name of Third/Joint Inventor Willi am A* Huffman 



Inventor's Signature _ Date 

Residence ...Los Gat ^^^^ Citizenship jjsa 

"(City Waie) 

P.O. Address ...1.6205 .Rose l^^^^ ?.1911 

Best Available Copy 
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Attorni (etNo.:SGI 15-4-457.00 



Full Name of Fourth/Joint Inventor: ...Hanry. p , 



Moraton 



Date 

inventor's Signature --r-r—r- Citizenship USA 

Residence woodside ^ 



P.O. Address "^..JJ^illi^^ 

Full Name of Fifth/Joint Inventor: .. Barl„.. a .. Kill.i..*.S " " 

X//^ /^^^ Date liJ.olJ...11.^... 

Inventor's Signature rTiizenshib usa 

Residence ...Los.. Altos Vus,..c Citizenship ...u.sa 

P.O. Address 6 1 . CentS^^^^ 
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Declaration for Patent Application 



Docket Number: 1 778.01 00000 



As a below named inventor, I hereby declare that: 

My residence, mailing address and citizenship are as stated below next to my name. 

I believe I am an original, first and joint inventor of the subject matter that is claimed and for which a patent is sought on the 
invention entitled Alignment and Ordering of Vector Elements for Single Instruction Multiple Data Processing, the 

specification of which is attached hereto unless the following box is checked: 

S was filed on October 9, 1 997; 

as United States Application Number 08/947,649 (now U.S. Patent No. 5,933,650; Issued: August 3, 1999); and 
was amended on January 6, 1999, and February 1, 1999. 

I hereby state that I have reviewed and understand the contents of the above identified specification, including the claims, as 
amended by any amendment referred to above. 

I acknowledge the duty to disclose information that is materia! to patentability as defined in 37 C.F.R. § 1.56, including for 
continuation-in-part applications, material information which became available between the filing date of the prior 
application and the national or PCT international filing date of the continuation-in-part application. 

I hereby claim foreign priority benefits under 35 U.S.C. § 119(a)-(d) or (f) or § 365(b) of any foreign application(s) for 
patent, inventor's or plant breeder's rights certificate(s), or § 365(a) of any PCT international application, which designated 
at least one country other than the United States of America, listed below, and have also identified below, by checking the 
box, any foreign application for patent, inventor's or plant breeder's rights certificate(s), or PCT international application 
having a filing date before that of the application on which priority is claimed. 

Prior Foreign Applications(s): Priority Claimed 



□ Yes 



□ No 



(Application No.) 



(Country) 



(Day/MonthA'ear Filed) 



□ Yes 



□ No 



(Application No.) 



(Country) 



(Day/Month/Year Filed) 



Send Correspondence to: 



Sterne, Kessler, Goldstein & Fox P.L.L.C. 
1 100 New York Avenue, N.W. 
Washington, D.C. 20005-3934 



Direct Telephone Calls to: 



(202) 371-2600 
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Appl No. 08/947,649 
Docket No. 1778.0100000 



I hereby declare that all statements made herein of my own knowledge are true and that all statements made on information 
and belief are believed to be true; and further that these statements were made with the knowledge that willful false 
statements and the like so made are punishable by fine or imprisonment, or both, under 18 U.S.C. § 1001 and that such 
willful false statements may jeopardize the validity of the application or any patent issued thereon. 



Full name of first Inventor: 


Timothy J. Van Hook 




Signature of first Inventor: 




Date: 


Residence: 


Atherton, CA 




Citizenship: 


U.S.A. 




Mailing Address: 


224 Oakgrove Road 
Atherton, CA, 94027 






Full name of second Inventor; 


Peter Yan-Tek Hsu 




Signature of second Inventor: 




Date: 


Residence: 


San Francisco, CA 




Citizenship: 


U.S.A. 




Mailing Address: 


1 Rausch Street, Unit F 
San Francisco, CA, 94103 






Full name of third Inventor: 


William A. Huffman 




Signature of third Inventor: 




Date: 


Residence: 


Los Gatos, CA 




Citizenship: 


U.S.A. 




Mailing Address: 


1 6205 Roseleaf Lane 

Los Gatos, CA, 95032-3610 
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Appl No. 08/947,649 
Docket No. 1778.0100000 



Full name of fourth Inventor: 


Henry P. Moreton 




SigTi&ture of fourth Inventori 




Date: 


Residence: 


Woodside, CA 




Citizenship: 


U.S.A. 




Mailing Address: 


140 Phillip Road 
Woodside, CA, 94062-2625 






Full name of fifth Inventor: 


Earl A. Killian 




Signature of fifth Inventor: 




Date: 


Residence: 


Los Altos Hills, CA 




Citizenship: 


U.S.A. 




Mailing Address; 


27961 Central Drive 

Los Altos Hills, CA, 94022 
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Appl No. 09/662,832 
Docket No. 1778.0100002 



Full name of fourth Inventor: 


Henry P. Moreton 




ol^llalUIC Ui lUUl 111 llivciliui • 




Date: 


Residence: 


Woodside, CA 




Citizenship: 


U.S.A. 




Mailing Address: 


140 Phillip Road 
Woodside, CA, 94062-2625 






Full name of fifth Inventor: 


Earl A. Killian 




Signature of fifth Inventor: 




Date: 


Residence: 


Los Altos Hills, CA 




Citizenship: 


U.S.A. 




Mailing Address: 


27691 Central Drive 

Los Altos Hills, CA, 94022 
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37 C.F.R. § 10.18(b) and (c): Effect of Signature and Certificate 
for Correspondence Filed in the Patent and Trademark Office 



37 C.F.R. § 10.18(b): By presenting to the Office, (whether by signing, filing, submitting, or later 
advocating), any paper, the party presenting such paper, whether a practitioner or non-practitioner, is certifying that: 

(1) All statements made therein of the party's own knowledge are true, all statements made therein on 
information and belief are believed to be true, and all statements made therein are made with the 
knowledge that whoever, in any matter within the jurisdiction of the USPTO, knowingly and 
willfully falsifies, conceals, or covers up by any trick, scheme, or device a material fact, or makes 
any false, fictitious or fi^uddent statements or representations, or makes or uses any false writing 
or document knowing the same to contain any false, fictitious or firaudulent statement or entry, 
shall be subject to the penalties set forth imder 18 U.S.C. § 1001, and that violations of this 
paragraph may jeopardize the vaUdity of the application or document, or the validity or 
enforceability of any patent, trademark registration, or certificate resulting therefix)m; and 



(2) To the best of the party's knowledge, information and belief, formed after an inquiry reasonable 
under the circumstances, that: 



(i) The paper is not being presented for any inqsroper purpose, such as to 
harass someone or to cause unnecessary delay or needless increase in 
the cost of prosecution before the Office; 

(ii) The claims and other legal contentions therein are warranted by 
existing law or by a nonMvolous argument for the extension, 
modification, or reversal of existing law or the establishment of new 
law; 

(iii) The allegations and other factual contentions have evidentiary support 
or, if specifically so identified, are likely to have evidentiary support 
after a reasonable opportunity for further investigation or discovery; 
and 

(iv) The denials of factual contentions are warranted on the evidence, or if 
specifically so identified, are reasonably based on a lack of information 
or belief. 



37 C.F.FL § 10.18(c): Violations of paragraph (bXO, by a practitioner or a non-practitioner, may jeopardize 
the validity of the application or document, or the validity or enforceability of any patent, trademark registration, or 
certificate resulting therefrom. Violations of any of paragraphs (b)(2)(i) through (iv) of this section are, after notice 
and reasonable opportunity to respond, subject to such sanctions as deemed appropriate by the Commissioner, or the 
Commissioner's designee, which may include, but are not limited to, any cond)ination of: 



(1) Holding certain facts to have been established; 

(2) Returning papers; 

(3) Precluding a party from filing a paper, or presenting or contesting an issue; 

(4) In^osing a monetary sanction; 

(5) Requiring a terminal disclaimer for the period of the delay; or 

(6) Terminating the proceedings in the Patent and Trademark Office. 



O2000 Sterne, KBSSLBK,Gou>STEM& FOX p.llc 
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Kristy Dahl - FedEx shipment 791344489555 



From: FedEx <donotreply@fedex.com> 

To: <donf@skgf.com> 

Date: 9/23/2004 3:12 PM 

Subject: FedEx shipment 791344489555 



Our records indicate that the shipment sent from Donald J. Featherstone/Sterne Kessler Go 
to Henry P. Moreton has been delivered. 

The package was delivered on 09/23/2004 at 10:49 AM and signed for 
or released by K.GREEN. 

The ship date of the shipment was 09/22/2004. 

The tracking number of this shipment was 791344489555. 

FedEx appreciates your business. For more information about FedEx services, 
please visit our web site at http://www.fedex.com 

To track the status of this shipment online please use the following: 
http://www.fedex.com/cqi-bin/trackinq? 

tracknumbers=791344489555&action=track&languaqe=english&cntry code=us 
Disclaimer 



FedEx has not validated the authenticity of any email address. 
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Kristy Dahl - FedEx shipment 791343579263 



From: FedEx <donotreply@fedex.com> 

To: <donf@skgf.com> 

Date: 9/23/2004 4:10 PM 

Subject: FedEx shipment 791343579263 



Our records indicate that the shipment sent from Donald J. Featherstone/Sterne Kessler Go 
to Timothy J, Van Hook has been delivered. 

The package was delivered on 09/23/2004 at 12:50 PM and signed for 
or released by .VANHOOK. 

The ship date of the shipment was 09/22/2004. 

The tracking number of this shipment was 791343579263. 

FedEx appreciates your business. For more information about FedEx services, 
please visit our web site at http://www.fedex.com 

To track the status of this shipment online please use the following: 
http://www.fedex.com/cgi-bin/tracking? 

tracknumbers=791343579263&action=track&lanquaqe=enqlish&cntry code=us 
Disclaimer 



FedEx has not validated the authenticity of any email address. 
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Kristy Dahl - FedEx shipment 792736103470 



From: FedEx <donotreply@fedex.com> 

To: <donf@skgf.com> 

Date: 9/23/2004 10:41 PM 

Subject: FedEx shipment 792736103470 



Our records indicate that the shipment sent from Donald J. Featherstone/Sterne Kessler Go 
to William A. Huffman has been delivered. 

The package was delivered on 09/23/2004 at 7:43 PM and signed for 
or released by W.. HUFFMAN. 

The ship date of the shipment was 09/22/2004. 

The tracking number of this shipment was 792736103470. 

FedEx appreciates your business. For more information about FedEx services, 
please visit our web site at http://www.fedex.com 

To track the status of this shipment online please use the following: 
http://www.fedex,com/cqi-bin/trackinq? 

tracknumbers=792736103470&action=track&lanquaqe=enqlish&cntry code=us 
Disclaimer 



FedEx has not validated the authenticity of any email address. 



file://D:\temp\GW}00010.HTM 



9/28/2004 



Page 1 of 1 



Kristy Dahl - FedEx shipment 791938175275 



From: 
To: 



FedEx <donotreply@fedex.com> 
<donf@skgf.com> 



Date: 9/23/2004 3:02 PM 

Subject: FedEx shipment 791938175275 



Our records indicate that the shipment sent from Donald J. Featherstone/Sterne Kessler Go 
to Earl A. Killian has been delivered. 

The package was delivered on 09/23/2004 at 11:09 AM and signed for 
or released by L.EE. 

The ship date of the shipment was 09/22/2004. 

The tracking number of this shipment was 791938175275. 

FedEx appreciates your business. For more information about FedEx services, 
please visit our web site at http://www.fedex.com 

To track the status of this shipment online please use the following: 
http://www.fedex.com/cqi-bin/trackinq? 

tracknumbers=791938175275&action=track8ilanguage=enqlish&cntry code=us 
Disclaimer 



FedEx has not validated the authenticity of any email address. 
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Kristy Dahl - FedEx shipment 792095057606 



Page 1 of 1 



From: FedEx <donotreply@fedex.com> 

To: <donf@skgf.com> 

Date: 9/27/2004 1:54 PM 

Subject: FedEx shipment 792095057606 



Our records indicate that the shipment sent from Donald J. Featherstone/Sterne Kessler Go 
to Peter Yan-Tek Hsu has been delivered. 

The package was delivered on 09/27/2004 at 10:47 AM and signed for 
or released by J.TRACY. 

The ship date of the shipment was 09/22/2004. 

The tracking number of this shipment was 792095057606. 

FedEx appreciates your business. For more information about FedEx services, 
please visit our web site at http://www.fedex.com 

To track the status of this shipment online please use the following: 
http://www.fedex.com/cai-bin/trackinq? 

tracknumbers=792095057606&action=track&lanquage=enq!ish&cntry code=us 
Disclaimer 



FedEx has not validated the authenticity of any email address. 
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I LuAnne DeSantis - Request for New Executed Declarations (1778.010) 



Pagelj 



From: LuAnne DeSantis 

To: peterhsu@cs.wisc-edu 

Date: 11/30/04 4:22PM 

Subject: Request for New Executed Declarations (1778.010) 

Re: Request for New Executed Declarations 

U.S. Pat. No. 5,933.650, Issued: August 3, 1999 

U.S. Pat. No. 6,266,758, Issued: July 24, 2001 

U.S. Pat. Appl. No. 09/662,832, Filed: September 15, 2000 

For: Alignment and Ordering of Vector Elements for 

Single Instruction Multiple Data Processing 
Inventors: Van Hook et al. 

SKGF Refs.: 1778.0100000, 1778.0100001, 1778.0100002 
Dear Mr. Hsu: 

We mailed a package to you (on September 22, 2004) requesting that you review the above-listed patents 
and application and that you sign and return new declarations for those patents and application. 
According to our records, the package was delivered to you on September 27, 2004 (signed for by 
"J.TRACY"). As explained in our letter accompanying the package, we are making this request, at the 
direction of the United States Patent and Trademark Office (USPTO), in order to correct an alleged filing 
error made by a previous law firm. 

I unsuccessfully tried to reach you by email on October 20, 2004, and by telephone on October 29, 2004, 
to ask if you found time to review the package. Assuming you have now had time to review the package, I 
am contacting you to ask if you have any questions that may expedite your response to our request. If you 
do have any questions or would like to discuss the contents of the package, please feel free to call me at 
(202) 772-8657. 

Please note that we need to file a response with the USPTO as soon as possible, and no later than 
December 14, 2004. If we do not receive a response from you by Monday, December 13, 2004, we will 
assume that you refuse to cooperate in prosecuting / maintaining the above-captioned patent application / 
patents. If you decide not to sign the declarations, we would appreciate a response that informs us of your 
refusal to sign. 

Best regards, 
LuAnne M. DeSantis 

This electronic message transmission contains information from the law firm of Sterne, Kessler, Goldstein 
& Fox P.L.L.C. which may be confidential or privileged. The information is intended to be for the use of 
the individual or entity named above. If you are not the intended recipient, be aware that any disclosure, 
copying, distribution or use of the contents of this information is prohibited. If you have received this 
electronic transmission in error, please notify us by telephone or by electronic mail immediately and delete 
the message without copying or disclosing it. 

LuAnne M. DeSantis, Associate 
Sterne, Kessler, Goldstein & Fox P.L.L.C. 
1 100 New York Avenue N.W. 
Washington, D.C. 20005 

office: (202)772-8657 

main: (202)371-2600 

fax: (202)371-2540 

email: ldesanti@skqf.com 
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CC: Don Featherstone 
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From: LuAnne DeSantis 

To: huffman@tensilica.com 

Date: 10/20/04 4;28PM 

Subject: Recent Request for New Executed Declarations (1778.010) 



Re: Recent Request for New Executed Declarations 
U.S. Pat. No. 5,933,650, Issued: August 3, 1999 
U.S. Pat. No. 6,266,758, Issued: July 24. 2001 
U.S. Pat. AppL No, 09/662,832, Filed: September 15. 2000 
For: Alignment and Ordering of Vector Elements for 

Single Instruction Multiple Data Processing 
Inventors: Van Hook et aL 

SKGF Refs.: 1778.010000, 1778.0100001, 1778.0100002 
Dear Mr Huffman: 

Further to my telephone conversation with you on September 9, 2004, we mailed a package to you (on 
September 22, 2004) requesting that you review the above-listed patents and application and that you sign 
and return new declarations for those patents and applications. As explained in our letter accompanying 
the package, we are making this request, at the direction of the United States Patent and Trademark 
Office, in order to correct an alleged filing error made by a previous law firm. 

According to our records, the package was delivered to you on September 23, 2004. Assuming you have 
had time to review the package, I am contacting you to ask if you have any questions that may expedite 
your response to our request. If you do have any questions or would like to discuss the contents of the 
package, please feel free to call me at (202) 772-8657. 

Best regards, 
LuAnne M. DeSantis 

This electronic message transmission contains information from the law firm of Sterne, Kessler, Goldstein 
& Fox P.L.L.C. which may be confidential or privileged. The information is intended to be for the use of 
the individual or entity named above. If you are not the intended recipient, be aware that any disclosure, 
copying, distribution or use of the contents of this information is prohibited. If you have received this 
electronic transmission in error, please notify us by telephone or by electronic mail immediately and delete 
the message without copying or disclosing it. 

LuAnne M, DeSantis, Associate 
Sterne, Kessler, Goldstein & Fox P.L.L.C. 
1 100 New York Avenue N.W. 
Washington, D.C. 20005 

office: (202) 772-8657 

main: (202)371-2600 

fax: (202)371-2540 

email: ldesanti@skqf.com 



CC: Don Featherstone 
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From: 
To: 
Date: 
Subject: 



LuAnne DeSantis 
huffman@tensilica.com 
11/30/04 4:22PM 

Request for New Executed Declarations (1778.010) 



Re: Request for New Executed Declarations 

U.S. Pat. Na 5»933,650, Issued: August 3. 1999 

U.S. Pat. No. 6,266,758, Issued: July 24, 2001 

U.S; Pat Appl. No. 09/662,832, Filed: September 15, 2000 

For: Alignment and Ordering of Vector Elements for 

Single Instruction Multiple Data Processing 
Inventors: Van Hook et al. 

SKGF Refs.: 1778.0100000, 1778.0100001, 1778.0100002 
Dear Mr. Huffman: 

We mailed a package to you (on September 22, 2004) requesting that you review the above-listed patents 
and application and that you sign and return new declarations for those patents and application. As 
explained in our letter accompanying the package, we are making this request, at the direction of the 
United States Patent and Trademark Office (USPTO), in order to correct an alleged filing error made by a 
previous law firm. 

I wanted to thank you for your voicemail message of October 30, 2004, in which you stated that you had 
not yet reviewed the package and therefore did not yet have any questions. Assuming you have now had 
time to review the package, I am contacting you to ask if you have any questions that may expedite your 
response to our request. If you do have any questions or would like to discuss the contents of the 
package, please feel free to call me at (202) 772-8657. 

Please note that we need to file a response with the USPTO as soon as possible, and no later than 
t December 14, 2004. If we do not receive a response from you by Monday, December 13, 2004, we will 
assume that you refuse to cooperate in prosecuting / maintaining the above-captioned patent application / 
patents. If you decide not to sign the declarations, we would appreciate a response that informs us of your 
refusal to sign. 

Best regards, 
LuAnne M. DeSantis 

This electronic message transmission contains information from the law firm of Sterne, Kessler, Goldstein 
& Fox P.L.L.C. which may be confidential or privileged. The information is intended to be for the use of 
the individual or entity named above. If you are not the intended recipient, be aware that any disclosure, 
copying, distribution or use of the contents of this information is prohibited. If you have received this 
electronic transmission in error, please notify us by telephone or by electronic mail immediately and delete 
the message without copying or disclosing it. 

LuAnne M. DeSantis, Associate 
Sterne, Kessler, Goldstein & Fox P.L.L.C. 
1 100 New York Avenue N.W. 
Washington, D.C. 20005 

office: (202)772-8657 

main: (202)371-2600 

fax: (202)371-2540 

email: ldesanti@skqf.com 
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CC: Don Featherstone 



December 7, 2004 



Donald J. Featherstone 
Sterne, Kessler, Goldstein & Fox P.L.L.C. 
1100 New York Avenue, NW 
Washington, DC 20005-3934 



Dear Mr. Featherstone, 

I have received your request for new declaration documents for U.S. Patent 
No. 5,933,650, U.S. Patent No. 6,266,758, and U.S. Patent Application No. 
09/622,832. I am familiar with Patent No. 5,933,650 from my original 
involvement with its filing and am glad to provide a new declaration to help 
with the apparently inadequate request for Rule 1.47 Status in the initial 
appUcation. Unfortunately, I have never seen Patent No. 6,266, 758 or Patent 
Application No. 09/622,832 nor do I have the time to read them. I am, 
therefore, unable to assist you with the two additional declarations. 




Sincerely, 



^ DEC 1 0 2004 

Steme, Kessler, Goldstein & Fox. 
P.LLC 
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From: 
To: 
Date: 
Subject: 



LuAnne DeSantis 
earl@killian.com 
10/25/04 8:09AM 



Recent Request for New Executed Declarations (1778.010) 



Re: Recent Request for New Executed Declarations 
U.S. Pat. No. 5,933,650, Issued: August 3, 1999 
U.S. Pat. No. 6,266,758, Issued: July 24, 2001 
U.S. Pat Appl. No. 09/662,832, Filed: September 15, 2000 
For: Alignment and Ordering of Vector Elements for 

Single Instruction Multiple Data Processing 
Inventors: Van Hook etal. 

SKGF Refs.: 1778.0100000, 1778.0100001, 1778.0100002 
Dear Mr. Killian: 

Further to my voicemail of September 1, 2004, we mailed a package to you (on September 22, 2004) 
requesting that you review the above-listed patents and application and that you sign and return new 
declarations for those patents and application. As explained in our letter accompanying the package, we 
are making this request, at the direction of the United States Patent and Trademark Office, in order to 
correct an alleged filing error made by a previous law firm. 

According to our records, the package was delivered to you on September 23, 2004 (signed for by 
"L.EE"). Assuming you have had time to review the package, I am contacting you to ask if you have any 
questions that may expedite your response to our request. If you do have any questions or would like to 
discuss the contents of the package, please feel free to call me at (202) 772-8657. 

Best regards, 
LuAnne M. DeSantis 

This electronic message transmission contains information from the law firm of Sterne, Kessler, Goldstein 
& Fox P.L.L.C. which may be confidential or privileged. The information is intended to be for the use of 
the individual or entity named above. If you are not the intended recipient, be aware that any disclosure, 
copying, distribution or use of the contents of this information is prohibited. If you have received this 
electronic transmission in error, please notify us by telephone or by electronic mail immediately and delete 
the message without copying or disclosing it. 

LuAnne M. DeSantis, Associate 
Sterne, Kessler, Goldstein & Fox P.L.L.C. 
1100 New York Avenue N.W. 
Washington, D.C. 20005 

office: (202)772-8657 

main: (202) 371-2600 

fax: (202)371-2540 

email: ldesanti@skqf.com 



CC: Don Featherstone 
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From: LuAnne DeSantis 

To: earl@killian.conn 

Date: 11/30/04 4:22PM 

Subject: Request for New Executed Declarations (1778.010) 

Re: Request for New Executed Declarations 

U.S. Pat. No. 5,933,650. Issued: August 3. 1999 

U.S. Pat. No. 6.266,758, Issued: July 24, 2001 

U.S. Pat. Appl. No. 09/662,832, Filed: September 15, 2000 

For: Alignment and Ordering of Vector Elements for 

Single Instruction Multiple Data Processing 
Inventors: Van Hook ef a/. 

SKGF Refs.: 1778.0100000, 1778.0100001, 1778.0100002 
Dear Mr. Killian: 

We mailed a package to you (on September 22, 2004) requesting that you review the above-listed patents 
and application and that you sign and return new declarations for those patents and application. As 
explained in our letter accompanying the package, we are making this request, at the direction of the 
United States Patent and Trademark Office (USPTO), in order to correct an alleged filing error made by a 
previous law firm. 

In a telephone conversation with you on October 29, 2004, you stated that you would consider reviewing 
the package and call if you had any questions. Assuming you found time to review the package, I am 
contacting you to ask if you have any questions that may expedite your response to our request. If you do 
have any questions or would like to discuss the contents of the package, please feel free to call me at 
(202) 772-8657. 

Please note that we need to file a response with the USPTO as soon as possible, and no later than 
December 14, 2004. If we do not receive a response from you by Monday, December 13, 2004, we will 
assume that you refuse to cooperate in prosecuting / maintaining the above-captioned patent application / 
patents. If you decide not to sign the declarations, we would appreciate a response that informs us of your 
refusal to sign. 

Best regards, 
LuAnne M. DeSantis 

This electronic message transmission contains information from the law firm of Sterne, Kessler, Goldstein 
& Fox P.L.L.C. which may be confidential or privileged. The information is intended to be for the use of 
the individual or entity named above. If you are not the intended recipient, be aware that any disclosure, 
copying, distribution or use of the contents of this information is prohibited. If you have received this 
electronic transmission in error, please notify us by telephone or by electronic mail immediately and delete 
the message without copying or disclosing it. 

LuAnne M. DeSantis, Associate 
Sterne, Kessler, Goldstein & Fox P.L.L.C. 
1 100 New York Avenue N.W. 
Washington, D.C. 20005 

office: (202)772-8657 

main: (202)371-2600 

fax: (202)371-2540 

email: IdesantiOskqf.com 
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From: "Earl A. Killian" <earl@killian.com> 

To: "LuAnne DeSantis" < LDESANTI@skgf.com > 

Date: 12/1/2004 11:59 AM 

Subject: Re: Request for New Executed Declarations (1778.010) 

CC: "Don Featherstone" < DON F@skgf.com > 



I have not yet had a chance to review the revised declaration, 
and since I do not feel it is possible to sign it without 
a thorough review, I suggest you go ahead and execute the 
application without my signature, as you indicated you would 
do for some of the other applicants on the phone. 

I apologize that I cannot be more helpful, but the reading 
patents is particularly tedious work, and I don't feel like 
this is a good use of my time just now, 

-Earl 
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ALIGNMENT AND ORDERING OF VECTOR ELEMENTS FOR SINGLE 
INSTRUCTION MULTIPLE DATA PROCESSING 



FIELD OF THE INVFMTTOM 

The present invention relates to the field of single instruction multiple 
data vector (SIMD) processing. More particularly, the present claimed 
invention relates to alignment and ordering vector elements for SIMD 
processing. 

BACKGROUND ART 

Today, most processors in microcomputer systems provide a 64-bit 
wide datapath architecture. The 64-bit datapath allows operations such as 
read, write, add, subtract, and multiply on the entire 64 bits of data at once. 
However, for many applications the types of data involved simply do not 
require the full 64 bits. In media signal processing (MDMX) applications, for 
example, the light and sound values are usually represented in 8, 12, 16, or 24 
bit numbers. This is because people typically are not able to distinguish the 
levels of light and sound beyond the levels represented by these numbers of 
bits. Hence, data types in MDMX applications typically require less than the 
full 64 bits provided in the datapath in most computer systems. 

To efficiently utilize the entire datapath, the current generation of 
processors typically utilizes a single instruction multiple data (SIMD) method. 
According to this method, a multitude of smaller numbers are packed into 
the 64 bit doubleword as elements, each of which is then operated on 
independently and in parallel. Prior Art Figure 1 illustrates an exemplary 
single instruction multiple data (SIMD) method. Registers, vs and vt, in a 
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processor are of 64-bit width. Each register is packed with four 16-bit data 
elements fetched from memory: register vs contains vs[0], vs[l], vs[2], and 
vsI3] and register vt contains vt[0], vt[l], vt[2], and vt[3]. The registers in 
essence contain a vector of N elements. To add elements of matching index, 
5 an add instruction adds, independently, each of the element pairs of matching 
index from vs and vt. A third register, vd, of 64-bit width may be used to 
store the result. For example, vs[0] is added to vt[0] and its result is stored into 
vd[0]. Similarly, vd[l], vd[2], and vd[3] store the sum of vs and vd elements of 
corresponding indexes. Hence, a single add operation on the 64-bit vector 

10 results in 4 simultaneous additions on each of the 16-bit elements. On the 

other hand, if 8-bit elements were packed into the registers, one add operation 
performs 8 independent additions in parallel. Consequently, when a SIMD 
arithmetic instruction such as addition, subtraction, or multiply, is performed 
on the data in the 64-bit datapath, the operation actually performs multiple 

15 numbers of operations independently and in parallel on each of the smaller 
elements comprising the 64 bit datapath. In SIMD vector operation, 
processors typically require alignment to the data type size of 64-bit 
doubleword on a load. This alignment ensures that the SIMD vector 
operations occur on aligned boundaries of a 64-bit doubleword boundary. 

20 

Unfortunately, the elements within application data vectors are 
frequently not 64-bit doubleword aligned for SIMD operations. For example, 
data elements stored in a memory unit are loaded into registers in a chunk 
such as a 64-bit doubleword format. To operate on the individual elements, 
25 the elements are loaded into a register. The order of the elements in the 
register remain the same as the order in the original memory. Accordingly, 
the elements may not be properly aligned for a SIMD operation. 
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traditionally, when elements are not aligned with a proper boundary 
as required for a SIMD vector operation, the non-aligned vector processing 
have typically been reduced to scalar processing. That is, operations took 
5 place one element at a time instead of simultaneous multiple operations. 
Consequently, SIMD vector operations lost parallelism and performance 
advantages when the vector elements were not properly aligned. 

Furthermore, many media applications require a specific ordering for 
10 the elements within a SIMD vector. Since elements necessary for SIMD 

processing are commonly stored in multiple 64-bit doublewords with other 
elements, these elements need to be selected and assembled into a vector of 
desired order. For example, multiple channel data are commonly stored in 
separate arrays or interleaved in a single array. Processing the data requires 
15 interleaving or deinterleaving the multiple channels. Other applications 
require SIMD vector operations on transposed 2 dimensional arrays of data. 
Yet other applications reverse the order of elements in an array as in FFTs, 
DCTs, and convolution algorithms. 

20 Thus, what is needed is a method for aligning and ordering elements 

for more efficient SIMD vector operations by providing computational 
parallelism. 
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gUMMAl^Y QF T^E INVENTION 

The present invention provides alignment and ordering of vector 
elements for SIMD processing. The present invention is implemented in a 
computer system including a processor having a plurality of registers. In the 
5 aligrmient of vector elements for SIMD processing, one vector is loaded from 
a memory unit into a first register and another vector is loaded from the 
memory unit into a second register. The first vector contains a first byte of an 
aligned vector to be generated. Then, a starting byte specifying the first byte of 
an aligned vector is determined. Next, a vector is extracted from the first 

10 register and the second register beginning from the first bit in the first byte of 
the first register continuing through the bits in the second register. Finally, 
the extracted vector is replicated into a third register such that the third 
register contains a plurality of elements aligned for SIMD processing. In the 
ordering of vector elements for SIMD processing, a first vector is loaded from 

15 a memory vmit into a first register and a second vector is loaded from the 
memory unit into a second register. Then, a subset of elements is selected 
from the first register and the second register. The elements from the subset 
are then replicated into the elements in the third register in a particular order 
suitable for subsequent SIMD vector processing. 

20 
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RT^TFF nKSCRTPTION OF THE DRAWINGS 

The accompanying drawings, which are incorporated in and form a 
part of this specification, illustrate embodiments of the invention and, 
together with the description, serve to explain the principles of the invention: 

5 

Prior Art Figure 1 illustrates an exemplary single instruction multiple 
data (SIMD) instruction method. 

Figure 2 illustrates a block diagram of an exemplary computer system 
for implementing the present invention. 
10 Figure 3 illustrates a block diagram of an exemplary datapath for 

aligning and ordering vector elements. 

Figure 4 illustrates a block diagram of an alignment unit in a processor 
for aligning a vector of elements. 

Figure 5 illustrates a flow diagram of the steps involved in extracting 
15 an aligned vector from two exemplary vectors. 

Figure 6A illustrates a block diagram of a full byte-mode crossbar circuit 
used in generating a vector of elements from elements of two vector registers. 

Figure 6B shows a more detailed diagram of the operation of an 
exemplary AND gate associated with element 7 in the first register, vs. 
20 Figure 7 illustrates shuffle operations for ordering 8-bit elements in a 

64-bit doubleword. 

Figure 8A illustrates a block diagram of a shuffle operation, which 
converts four unsigned upper bytes (i.e., 8 bits) in a source register to four 16- 
bit halves in a destination register. 
25 Figure 8B illustrates a block diagram of a shuffle operation, which 

converts a vector of unsigned low 4 bytes from a source register to four 16-bit 
halves in a destination register. 
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Figure 8C illustrates a block diagram of a shuffle operatior\, which 
converts a vector of signed upper 4 bytes from a source register to four 16-bit 
halves in a destination register by replicating the signs across the upper bytes 
in the halves. 

5 Figure 8D illustrates a block diagram of a shuffle operation, which 

converts a vector of signed low 4 bytes from a source register to four 16-bit 
halves in a destination register by replicating the signs across the upper bytes 
in the halves. 

Figure 8E illustrates a block diagram of a shuffle operation, which 
10 replicates the odd elements of 8 8-bit elements from each of two source 
registers into 8 elements in a destination vector register. 

Figure 8F illustrates a block diagram of a shuffle operation, which 
replicates the even elements of 8 8-bit elements from each of two source 
registers into 8 elements in a destination vector register. 
15 Figure 8G illustrates a block diagram of a shuffle operation, which 

replicates the upper 4 elements of 8 8-bit elements from each of two source 
registers into 8 elements in a destination vector register. 

Figure 8H illustrates a block diagram of a shuffle operation, which 
replicates the lower 4 elements of 8 8-bit elements from each of two source 
20 registers into 8 elements in a destination vector register. 

Figure 9 illustrates shuffle operations for ordering 16-bit elements in a 
64-bit doubleword. 

Figure lOA illustrates a block diagram of a shuffle operation, which 
replicates the upper 2 elements of 4 16-bit elements from each of two source 
25 registers into 4 elements in a destination vector register. 
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Figure lOB illustrates a block diagram of a shuffle operation, which 
replicates the lower 2 elements of 4 16-bit elements from each of two source 
registers into 4 elements in a destination vector register. 

Figure IOC illustrates a block diagram of a shuffle operation, which 
replicates 2 odd elements of 4 16-bit elements from each of two source 
registers into 4 elements in a destination vector register. 

Figure lOD illustrates a block diagram of a shuffle operation, which 
replicates 2 even elements of 4 16-bit elements from each of two source 
registers into 4 elements in a destination vector register. 

FigureJOE illustrates a block diagram of a shuffle operation, which 
replicates even elements 0 and 2 from one source register into odd elements 1 
and 3 in a destination vector register and further replicates odd elements 1 
and 3 from another source register into the even elements 0 and 2, 
respectively, of the destination vector register. 

Figure lOF illustrates a block diagram of a shuffle operation, which 
replicates even elements 0 and 2 from one source register into odd elements 3 
and 1, respectively, in a destination vector register and further replicates odd 
elements 1 and 3 from another source register into the even elements 2 and 0, 
respectively, of the destination vector register. 

Figure lOG illustrates a block diagram of a shuffle operation, which 
replicates the upper 2 elements of 4 16-bit elements from each of two source 
registers into a destination vector register. 

Figure lOH illustrates a block diagram of a shuffle operation, which 
replicates the lower 2 elements of 4 16-bit elements from each of two source 
registers into a destination vector register. 
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DESCRTPTION OF THE PREFERRED EMBODIMENTS 

In the following detailed description of the present invention, 
numerous specific details are set forth in order to provide a thorough 
understanding of the present invention. Hov^ever, it v^ill be obvious to one 
5 skilled in the art that the present invention may be practiced without these 
specific details. In other instances well known methods, procedures, 
components, and circuits have not been described in detail so as not to 
unnecessarily obscure aspects of the present invention. 

10 The present invention, a method for providing alignment and 

ordering of vector elements for single-instruction multiple-data (SIMD) 
processing, is described. The preferred embodiment of the present invention 
provides elements aligned and ordered for an efficient SIMD vector operation 
in a processor having 64-bit wide datapath within an exemplary computer 

15 system described below. Although such a datapath is exemplified herein, the 
present invention can be readily adapted to suit other datapaths of varying 
widths. 

COMPUTER SYSTEM ENVIRONMENT 
20 Figure 2 illustrates an exemplary computer system 200 comprised of a 

system bus 206 for communicating information, a processor 202 coupled with 
the bus 206 for processing information and instructions, a computer readable 
volatile memory unit 210 (e.g., random access memory, static RAM, dynamic 
RAM, etc.) coupled with the bus 206 for storing information and instructions 
25 for the processor 202, a computer readable non-volatile memory unit 208 (e.g., 
read only memory, programmable ROM, flash memory, EPROM, EEPROM, 
etc.) coupled with the bus 206 for storing static information and instructions 
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for the processor 202. A vector register file 204 containing a plurality of 
registers is included in the processor 202. In the present invention, the term 
vector register file 204 encompasses any register file containing a plurality of 
registers and as such is not limited to vector register files. 

The computer system 200 of Figure 2 further includes a mass storage 
computer readable data storage device 212 (hard drive, floppy, CD-ROM, 
optical drive, etc.) such as a magnetic or optical disk and disk drive coupled 
with the bus 206 for storing information and instructions. Optionally, the 
computer system 200 may include a display device 214 coupled to the bus 206 
for displaying information to the user, an alphanumeric input device 216 
including alphanumeric and function keys coupled to the bus 206 for 
communicating information and command selections to the processor 202, a 
cursor control device 218 coupled to the bus 206 for communicating user 
input information and command selections to the processor 202, and a signal 
generating device 220 coupled to the bus 206 for communicating command 
selections to the processor 202. 

According to an exemplary embodiment of the present invention, the 
processor 202 includes a SIMD vector unit that functions as a coprocessor for 
or as an extension of the processor 202. The SIMD vector unit performs 
various arithmetic and logical operations on each data element within a 
SIMD vector in parallel. The SIMD vector unit utilizes the register files of the 
processor 202 to hold SIMD vectors. The present invention may include one 
or more SIMD vector units to perform specialized operaHons such as 
arithmetic operations, logical operations, etc. 
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Figure 3 illustrates a block diagram of an exemplary datapath 300 for 
aligning and ordering vector elements. The datapath 300 includes a SIMD 
vector uiut 302, an aligrunent unit 322, a register file 304, a crossbar circuit 314, 
and a vector load/store unit 302. The vector load/store unit 302 performs 
load and store functions. It loads a vector from memory into one of the 
registers in the register file 304. It also stores a vector from one of the registers 
in the register file 304 into main memory. The alignment unit 312 receives 
two vectors from two source registers such as vs 306 andwt 308. Then, the 
aligrunent unit 312 extracts an aligned vector from the two vectors and stores 
it into a destination register such as vd 310. The crossbar circuit 314 also 
receives two vectors two exemplary source registers, vs 306 and vt 308. The 
crossbar circuit 314 then selects a set of elements from the source registers and 
routes each of the elements in the selected set to a specified element in the 
exemplary destination register, vd 310. In an alternative embodiment, the 
crossbar circuit 314 may receive one vector from a single source register and 
select a set of elements from the vector. The data path 318 allows a result to 
be forwarded to the register file 304 or to the vector load/store unit to be 
stored into main memory. 

The SIMD vector unit 302 represents a generic SIMD vector processing 
unit, which may be an arithmetic unit, logical unit, integer unit, etc. The 
SIMD vector unit 302 may receive either one or two vectors from one or two 
source registers. It should be appreciated that the present invention may 
include more than one SIMD vector unit performing various functions. The 
SIMD vector unit 302 may execute an operation specified in the instruction 
on each element within a vector in parallel. 



SGI-15-4-457.00 



10 



October 7, 1997 



The exemplary vector register file 304 is preferably comprised of 32 64- 
bit general purpose registers. To this end, the preferred embodiment of the 
present invention utilizes the floating point registers (FGR) of a floating point 
unit (FPU) in the processor as its vector registers. In this shared arrangement, 
5 data is moved between the vector register file 304 and a memory unit through 
the vector load/store unit 302. These load and store operations are 
imformatted. That is, no format conversions are performed and therefore no 
floating-point exceptions can occur due to these operations. Similarly, data is 
moved between the vector register file 304 and the alignment unit 312, the 
10 crossbar circuit 314, or the SIMD vector unit 316 without format conversions, 
and thus no floating-point exception occurs. 

The present invention allows data types of 8-, 16-bit, 32-, or 64-bit fields. 
Hence, a 64-bit doubleword vector may contain 8 8-bit elements, 4 16-bit 

15 elements, 2 32-bit elements, or 1 64-bit element. According to this 

convention, vector registers of the present invention are interpreted in the 
following data formats: Quad Half (QH), Oct Byte (OB), Bi word (BW), and 
Long (L). In QH format, a vector register is interpreted as having 16-bit 
elements. For example, a 64-bit vector register is interpreted as a vector of 4 

20 signed 16-bit integers. OB format interprets a vector register as being 

comprised of 8-bit elements. Hence, an exemplary 64-bit vector register is 
seen as a vector of 8 unsigned 8-bit integers. In BW format, a vector register is 
interpreted as having 2 32-bit elements. L format interprets a vector register 
as having a 64-bit element. These data types are provided to be adaptable to 

25 various register sizes of a processor. As described above, data format 

conversion is not necessary between these formats and floating-point format. 
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According to a preferred embodimer\t of the present invention, 
exemplary source registers, vs and vt, are each used to hold a set of vector 
elements. A third exemplary vector register, vd, is created from the source 
registers and holds a set of elements selected from the source registers. 
5 Although the registers, vs, vt, and vd, are used to associate vector registers 
virith a set of vector elements, other vector registers are equally suitable for 
present invention. 

LOAD /STORE INSTRUCTIONS 
10 The load and store instructions of the present invention use a special 

load/store unit to load and store a 64-bit doubleword between a register in a 
register file such as an FPR and a memory unit. The doubleword is loaded 
through an. exemplary load/store unit 302 illustrated above in Figure 3. The 
load/store unit performs loading or storing of a doubleword with upper 61 
15 bits of an effective address. The lowest 3 bits specify a byte address within the 
64-bit doubleword for alignment. 

According to a preferred embodiment, an effective address is formed by 
adding the contents of an index value in a general purpose register (GPR) to a 

20 base address in another GPR. The effective address is doubleword aligned. 
During the loading process, the last three bits of the effective address are 
ignored by treating these bits as Os. Hence, the effective address is comprised 
of bits 3 to 63. The three bits from 0 to 2 contain the byte address for accessing 
individual bytes within a doubleword and are ignored by treating the three 

25 bits as Os. If the size of a register in a register file is 64-bits, then the 64-bit data 
stored in memory at the effective address is fetched and loaded into the 
register. If on the other hand, the size of the register in the register file is 32- 
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bits, then the lower 32 bits of the data are loaded into the vector register and 
the upper 32 bits of the data are loaded into the next register in sequence. 
Hence, a pair of 32-bit registers are used to hold a 64-bit data from the 
memory. 

5 

Conversely, the store instruction stores a doubleword from a vector 
register such as an FPR to the memory while ignoring alignment. The store 
operation is carried out through the exemplary load /store unit 302 illustrated 
above in Figure 3. The contents of a 64-bit doubleword in FPR, fs, is stored at 
10 the memory location specified by the effective address. The contents of GPR 
index and GPR base are added to form the effective address. The effective 
address is doubleword aligned. The last three bits of the effective address are 
ignored. 



15 The effective address is formed by adding the contents of an index 

value in a general purpose register (GPR) to a base address in another GPR 
while ignoring the lowest three bits of the effective address by interpreting 
them as Os. That is, the effective address is comprised of bits 3 to 63. The 
ignored three bits contain the byte address for accessing individual bytes 

20 within a doubleword. If the size of a vector register is 64-bits, then the content 
of the vector register is stored into memory. If on the other hand, the size of a 
vector register is 32-bits, then the lower 32 bits of the data are concatenated 
with the upper 32 bits of the data contained in the next register in sequence. 
Then, the concatenated 64-bit doubleword is stored into memory at the 

25 address specified by the effective address. 
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ALIGNMENT INSTRUCTION 
The present alignment instruction operates on two 64-bit doublewords 
loaded into two registers from memory by issuing two load instructions. One 
doubleword is loaded into a first register (vs) and the other doubleword is 
5 loaded into a second register (vt). The alignment instruction generates a 64- 
bit doubleword vector in a third register (vd) aligned for a SIMD vector 
operation. Preferably, an alignment unit performs aligrunent of a vector by 
funnel shift to extract an aligned 64-bit vector of elements from the two 64-bit 
registers. 

10 

Figure 4 illustrates a block diagram of an alignment unit in a processor 
for aligning a vector of elements. The vector load/store unit 404 loads' two 
vectors from main memory 402 into two vector registers, vs and vt, in a 
register file 408. The alignment unit 410 receives the two vectors in the 
15 vector registers, vs and vt, and extracts a byte aligned vector. Three control 
lines 412 representing three bits for the byte address controls the byte 
alignment performed through the alignment unit 410. The aligned vector is 
then forwarded to an exemplary vector register, vd, in the register file. 

20 The alignment of a vector is dependent on a byte ordering mode of a 

processor. Byte ordering within a larger data size such as a 64-bit doubleword 
may be configured in either big-endian or little-endian order. Endian order 
refers to the location of byte 0 within a multi-byte data. A processor according 
' to the present invention may be configured as either a big-endian or little- 

25 endian system. For example, in a little-endian system, byte 0 is the least 

significant (i.e., rightmost) byte. On the other hand, in a big-endian system, 
byte 0 is the most significant (i.e., leftmost) byte. In the present invention, an 
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exemplary processor uses byte addressing for a doubleword access, which is 
aligned on a byte boundary divisible by eight (i.e., 0, 8, 16, 56). Hence, a 64- 
bit doubleword loaded into a register in a processor is byte-aligned in either a 
big-endian or a little-endian mode. For a little-endian mode processor, the 
5 starting (i.e., first) byte for a vector to be extracted lies in the second vector 
register. Conversely for a big-endian mode processor, the starting (i.e., first) 
byte for the vector resides in the first vector register. 

Figure 5 illustrates a flow diagram of the steps involved in extracting 
10 an aligned vector from two exemplary vectors. In step 502, two 64-bit 

doublewords are loaded from a memory unit into two 64-bit registers. One 
64-bit doubleword is loaded into a first register and the other 64-bit 
doubleword in memory is loaded into the second register. Preferably, the 
former doubleword and the next doubleword are stored in contiguous 
15 memory space and their starting addresses differ by 64-bits or 8 bytes. The 
loading of the doublewords are accomplished through a load/store unit 
according to the load instruction described above. 

The starting byte address of the aligned vector to be extracted is then 
20 determined in step 704. According to the preferred embodiment, the register 
and vector are all 64-bit wide. Since a 64-bit doubleword contains 8 bytes, 
three bits are needed to specify all the byte positions in a 64-bit doubleword. 
Hence, the preferred embodiment uses 3 bits to specify the position of the 
starting byte address in a 64-bit vector. 

25 

In one embodiment of the present invention, an alignment instruction 
provides an immediate, which is a constant byte address within a 
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doubleword. Preferably, the immediate consists of 3 bits for specifying a 
constant byte address to a byte among 8 bytes each in the first register (i.e., 
little-endian mode processor) and the second register (i.e., bit-endian mode 
processor). This alignment instruction performs a constant alignment of a 
5 vector. The align amount is computed by masking the immediate, then 
using that value to control a funnel shift of vector vs concatenated with 
vector vt. The operands can be in the QH, OB, or BW format. 

In an alternative embodiment, the alignment instruction provides a 
10 variable byte addressing by specifying an address of a general purpose register 
(GPR) containing the starting byte address in the first register. This 
instruction accesses the GPR by using the address provided in the alignment 
instruction. Then, the instruction extracts the lower 3 bits in the GPR to 
obtain the starting byte address in the first register (i.e., little-endian mode) or 
15 the second register (i.e., big-endian mode). The align amount is computed by 
masking the contents of GPR, rs, then using that value to control a funnel 
shift of vector vs concatenated with vector vt. The operands can be in QH, 
OB, or BW format. 

20 After determining the starting byte address in step 504 of the flowchart 

in Figure 5, the first bit of the starting byte address is determined in step 506 by 
multiplying the starting byte address by 8. For example, if the starting byte 
address were 3, the first bit of the starting byte address is 3*8 or 24. Then in 
step 508, a 64-bit doubleword is extracted by concatenating from the first bit at 

25 the starting byte address in one register continuing through the other register. 
This concatenation is accomplished by funnel shifting from the first bit of the 
starting byte. Specifically, the first register is assigned bit positions from 0 to 
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63. The second register is assigned the next 64 bit positions from 64 to 127. 
The extraction scheme depends on the byte ordering modes. A variable s, 
representing the first bit position at the starting byte address, can be used to 
simplify the illustration of the differences between the byte ordering modes. 
In a big-endian byte mode, the concatenation occurs from bit position 127-s to 
64-s. Conversely, in a little-endian bye mode, the concatenation occurs from 
bit position s through 63+s. 

Then in step 510, the extracted vector is replicated into a destination 
register in the register file for SIMD vector processing. In an altemative, 
embodiment, the extracted vector may be stored into the memory unit for 
later use. The process then terminates in step 512. 

SHUFFLE INSTRUCTION 
The shuffle instruction according to the present invention provides a 
vector of ordered elements selected from either one or two other vector 
registers. One or more load /store instructions are used to load the vector(s) 
into registers for shuffle operation. One embodiment uses a full byte-mode 
crossbar to generate a vector of elements selected from the elements of two 
other exemplary vectors. That is, selected elements of the exemplary vectors, 
vs and vt, are merged into a new exemplary vector, vd. The new vector, vd, 
contains elements aligned for SIMD operation, Altematively, a plurality of 
shuffle operations may be carried out to arrange the elements in a desired 
order for SIMD vector processing. 

Figure 6A illustrates a block diagram of a full byte-mode crossbar circuit 
600 used in generating a vector of elements from elements of two registers. 
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First, two vectors from a memory imit are loaded into two exemplary 
registers in a processor; the elements of the first vector are loaded into the 
first register, vs 602, and the elements of the second vector are loaded into the 
second register, vt 604. The elements of these two vector registers, vs 602 and 

5 vt 604, serve as source elements. The crossbar circuit 600 receives as input 
each of the elements from the two vector registers in parallel. A set of control 
lines 608 is coupled to the crossbar circuit 600 to relay a specific shuffle 
instruction operation. The shuffle instruction operation encodes a 
destination element for each of the selected source elements. In response to 

10 the specific shuffle instruction operation signals, the crossbar circuit 600 

selects a set of elements from the two registers, vs 602 and vt 604, and routes 
or replicates each element to its associated destination element in an 
exemplary destination register, vd 606. 

15 In addition, the present invention allows zeroing and sign extension 

of elements. For example with reference to Figure 6A, the present invention 
provides either zeroing or sign extension for each element in the first register, 
vs 602, In addition to providing the entire bits to the crossbar circuit 600, 
elements 0 through 7 in the first register, vs 602, provides their corresponding 

20 sign bits 612, 614, 616, 618, 620, 622, 624, and 626 (612 through 626) to the 
associated AND gates 628, 630, 632, 634, 636, 638, 640, and 642 (628 through 
642). Each of the AND gates 628 through 642 also receives as the other input, 
a control signal 610, which originate from a specific shuffle instruction for 
specifying either zeroing or sign extension mode. 

25 

Figure 6B shows a more detailed diagram of the operation of the 
exemplary AND gate 628 associated with element 7 in the first register, vs 602. 
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The AND gate 628 receives a single sign bit 612 from the most significant bit 
in the element 7 of the first register, vs 602. The AND gate 628 also receives 
the control signal 610. To provide zeroing for element 7 for example, the 
control signal 610 inputs a 0 into the AND gate 628. In this case, the output 

5 652 at the AND gate 628 is 0 no matter what the input is at the sign bit 612. On 
the other hand, when the control signal is 1, the AND gate 628 generates the 
sign bit 612 as the output 652, whatever the sign is. In both cases of zeroing 
and sign extension, the output 652 is routed to a plurality of output lines 654 
for replicating the output signal into an appropriate width. Preferably, the 

10 output lines 654 matches the number of bits in each element in the first 
register, vs 602. The crossbar circuit 600 accepts the signals on these output 
lines 652 and uses these signals to zero or sign extend element 7 when 
necessary according to a shuffle instruction. The AND gates for the other 
elements 0 to 6 operate in a similar maimer to provide zeroing and sign 

15 extension bit signals to the crossbar circuit 600. 

The preferred embodiment of the present invention operates on 
vectors of elements in a preferred OB or QH mode. In an OB mode, a 64-bit 
doubleword vector is interpreted as having 8 8-bit elements. In a QH mode, 

20 the 64-bit vector is treated as containing 4 16-bit elements. For example, in OB 
mode, the crossbar circuit 600 selects, in parallel, as source elements eight 8-bit 
elements among the elements in the registers vs 602 and vt 604. Each of the 
eight elements is then replicated or routed into a particular destination 
element in the destination vector register, vd 606. In QH mode, the crossbar 

25 circuit selects four 16-bit elements and replicates or routes each element into a 
particular destination element in the destination register. Those skilled in 
the art will appreciate that the crossbar circuit represents one embodiment of 
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the present invention in implementing the shuffle instruction operations. A 
crossbar circuit is well known the art and is commonly used in conjunction 
with vector processing units. 

5 Figure 7 illustrates shuffle operations for ordering 8-bit elements in a 

64-bit doubleword. Each row represents the destination vector register, vd, 
comprised of 8 elements, vd[0] to vd[7]. The first row 702 is comprised of 
placeholders to indicate the 8 elements. Below the first row 702 are 8 different 
shuffle operations in OB mode as indicated by the content of destination 

10 vector register, vd, for each row 704 to 718. These shuffle operations in OB 
mode are illustrated in Figures 8A through 8H. 

Figure 8A illustrates a block diagram of a shuffle operation, which 
converts four unsigned upper bytes (i.e., 8 bits) in a source register to four 16- 

15 bit halves in a destination register. This shuffle operation, represented by 
mnemonic UPUH.OB, selects the upper 4 8-bit elements in an exemplary 
vector register, vs. The selected elements vs[4], vs[5], vs[6], and vs[7] are 
replicated into destination elements vd[0], vd[2], vd[4], and vd[6], respectively. 
The odd elements of the destination vector register vd[l], vd[3], vd[5], and 

20 vd[7] are zeroed. 

Figure 8B illustrates a block diagram of a shuffle operation, which 
converts a vector of unsigned low 4 bytes in a register to 16-bit halves. This 
shuffle operation, represented by mnemonic UPUL.OB, selects the lower 4 8- 
25 bit elements in an exemplary vector register, vs. The selected elements vs[0], 
vs[l], vsI2], and vs[3] are replicated into destination elements vd[0], vd[2], 
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vd[4], and vd[6], respectively. The odd elements of the destination vector 
register vd[l], vd[3], vd[5], and vd[7] are zeroed. 

Figure 8C illustrates a block diagram of a shuffle operation, which 
5 converts a vector of signed upper 4 bytes in a register to 16-bit halves. This 
shuffle operation, represented by mnemonic UPSH.OB, selects the upper 4 8- 
bit elements in an exemplary vector register, vs. The selected elements vs[4], 
vsI5], vs[6], and vs[7] are replicated into destination elements vd[0], vd[2], 
vd[4], and vd[6], respectively. The odd elements of the destination vector 
10 register vd[l], vd[3], vd[5], and vd[7] replicates the sign bits of the selected 
elements vs[4], vs[5], vs[6], and vs[7], respectively. 

Figure 8D illustrates a block diagram of a shuffle operation, which 
converts a vector of signed low 4 bytes in a register to 16-bit halves. This 

15 shuffle operation, represented by mnemonic UPSL.OB, selects the lower 4 8- 
bit elements in an exemplary vector register, vs. The selected elements vs[0], 
vs[l], vs[2], and vs[3] are replicated into destination elements vd[0], vd[2], 
vd[4], and vd[6], respectively. The odd elements of the destination vector 
register vd[l], vd[3], vd[5], and vd[7] replicates the sign bits of the selected 

20 elements vs[0], vs[l], vs[2], and vs[3], respectively. 

Figure 8E illustrates a block diagram of a shuffle operation, which 
replicates the odd elements of 8 8-bit elements from each of two source 
registers into 8 elements in a destination vector register. This shuffle 
25 operation, represented by an exemplary mnemonic PACH.OB, selects the odd 
elements of 8 8-bit elements in exemplary source vector registers, vs and vt. 
The elements selected from vs, namely vs[l], vs[3], vs[5], and vs[7] are 
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replicated into destination elements vd[4], vd[5], vd[6], and vd[7], respectively. 
The elements vt[l], vt[3], vt[5], and vt[7] from the vector register vt are 
replicated into destination elements vd[0], vd[l], vd[2], and vd[3], respectively. 

5 Figure 8F illustrates a block diagram of a shuffle operation, which 

replicates the even elements of 8 8-bit elements from each of two source 
registers into 8 elements in a destination vector register. This shuffle 
operation, represented by an exemplary mnemonic PAGL.OB, selects the even 
elements of 8 8-bit elements in exemplary source vector registers, vs and vt. 

10 The elements selected from vs, namely vs[0], vs[2], vs[4], and vs[8] are 

replicated into destination elements vd[4], vd[5], vd[6], and vd[7], respectively. 
The elements vt[0], vt[2], vt[4], and vt[6] from the vector register vt are 
replicated into destination elements vd[0], vd[l], vd[2], and vd[3], respectively. 

15 Figure 8G illustrates a block diagram of a shuffle operation, which 

replicates the upper 4 elements of 8 8-bit elements from each of two source 
Registers into 8 elements in a destination vector register. This shuffle 
operation, represented by an exemplary mnemonic MIXH.OB, selects the 
upper 4 elements of 8 8-bit elements in exemplary source vector registers, vs 

20 and vt. The elements selected from vs, namely vs[4], vs[5], vs[6], and vs[7] are 
replicated into the odd elements of the destination vector register, namely 
vd[l], vd[3], vd[5], and vd[7], respectively. The elements vt[4], vt[5], vt[6], and 
vt[7] from the vector register vt are replicated into the even elements of the 
destination elements vd[0], vd[2], vd[4], and vd[6], respectively. 

25 

Figure 8H illustrates a block diagram of a shuffle operation, which 
replicates the lower 4 elements of 8 8-bit elements from each of two source 
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registers into 8 elements in a destination vector register. This shuffle 
operation, represented by an exemplary mnemonic MIXL.OB, selects the 
lower 4 elements of 8 8-bit elements in exemplary source vector registers, vs 
and vt The elements selected from vs, namely vs[0], vs[l], vs[2], and vs[3] are 
5 replicated into the odd elements of the destination vector register, namely 
vd[l], vd[3], vd[5], and vd[7], respectively. The elements vt[0], vt[l], vt[2], and 
vt[3] from the vector register vt are replicated into the even elements of the 
destination elements vd[0], vd[2], vd[4], and vd[6], respectively. 

10 A shuffle instruction operating in QH mode generates a new vector of 

elements for two types of operations. The first type of operation creates a 
vector of new data sizes by converting data sizes between 16-bit elements and 
32-bit elements in a vector. The second type creates a new vector of elements 
drawn from two other vectors. The present exemplary data type conversion 

15 operations enable a larger range of computational data format than their 

storage format, such as 32 bit computation on 16 bit numbers. In addition, the 
present embodiment operations allow conversion of a data set from a smaller 
range format to a larger range format or vice versa as between 16 and 32 bit 
data, 

20 

Figure 9 illustrates shuffle operations for ordering 16-bit elements in a 
64-bit doubleword. Each row represents the destination vector register, vd, 
comprised of 4 elements, vd[0] to vd[7]. The first row 902 is comprised of 
placeholders to indicate the 4 elements. Below the first row 902 are 4 different 
25 shuffle operations in QH mode as indicated by the content of destination 
vector register, vd, for each row 904 to 918. These shuffle operations in QH 
mode are illustrated in Figures lOA through lOH. 
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Figure lOA illustrates a block diagram of a shuffle operation, which 
replicates the upper 2 elements of 4 16-bit elements from each of two source 
registers into 4 elements in a destination vector register. This shuffle 

5 operation, represented by an exemplary mnemonic MIXH.QH, selects the 
upper 2 elements of 4 16-bit elements in exemplary source vector registers, vs 
and vt. The elements selected from vs, namely vs[2] and vs[3] are replicated 
into the odd elements of the destination vector register, namely vd[l] and 
vd[3], respectively. The elements vt[2] and vt[3] from the vector register vt are 

10 replicated into the even elements of the destination elements vd[0] and vd[2], 
respectively. 

Figure lOB illustrates a block diagram of a shuffle operation, which 
replicates the lower 2 elements of 4 16-bit elements from each of two source 

15 registers into 4 elements in a destination vector register. This shuffle 

operation, represented by an exemplary mnemonic MIXL.QH, selects the 
lower 2 elements of 4 16-bit elements in exemplary source vector registers, vs 
and vt. The elements selected from vs, namely vs[0] and vs[l] are replicated 
into the odd elements of the destination vector register, namely vd[l] and 

20 vd[3], respectively. The elements vt[0] and vt[l] from the vector register vt are 
replicated into the even elements of the destination elements vd[0] and vd[2], 
respectively. 

Figure IOC illustrates a block diagram of a shuffle operation, which 
25 replicates 2 odd elements of 4 16-bit elements from each of two source 
registers into 4 elements in a destination vector register. This shuffle 
operation, represented by an exemplary mnemonic PACH.QH, selects the 2 
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odd elements of 4 16-bit elements in exemplary source vector registers, vs and 
vt. The elements selected from vs, namely vs[l] and vs[3] are replicated into 
the upper 2 elements of the destination vector register, namely vd[2] and 
vd[3], respectively. The elements vt[l] and vt[31 from the vector register vt are 
5 replicated into the lower 2 elements of the destination elements vd[0] and 
vd[l], respectively. 

Figure lOD illustrates a block diagram of a shuffle operation, which 
replicates 2 even elements of 4 16-bit elements from each of two source 

10 registers into 4 elements in a destination vector register. This shuffle 

operation, represented by an exemplary mnemonic PACL.QH, selects the 2 
even elements of 4 16-bit elements in exemplary source vector registers, vs 
and vt. The elements selected from vs, namely vs[0] and vs[2] are replicated 
into the upper 2 elements of the destination vector register, namely vd[2] and 

15 vd[3], respectively. The elements vt[0] and vt[2] from the vector register vt are 
replicated into the lower 2 elements of the destination elements vd[0] and 
vd[l], respectively. 

Figure lOE illustrates a block diagram of a shuffle operation, which 
20 replicates even elements from one source register and odd elements from 
another source register into a destination vector register. This shuffle 
operation, represented by an exemplary mnemonic BFLA.QH, selects the 2 
even elements of 4 16-bit elements from an exemplary source vector register, 
vs. The shuffle operation also selects the 2 odd elements of 4 16-bit elements 
25 from another exemplary source vector register, vt. The even elements 

selected from vs, namely vs[0] and vs[2] are replicated into the 2 odd elements 
of the destination vector register, namely vd[l] and vd[3], respectively. The 
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odd elements vt[l] and vt[3] from the vector register vt are replicated into the 
2 even elements of the destination elements vd[0] and vd[l], respectively. 

Figure lOF illustrates a block diagram of a shuffle operation, which 
5 replicates even elements from one source register and odd elements from 
another source register into a destination vector register. This shuffle 
operation, represented by an exemplary mnemonic BFLB.QH, selects the 2 
even elements of 4 16-bit elements from an exemplary source vector register, 
vs. The shuffle operation also selects the 2 odd elements of 4 16-bit elements 
10 from another exemplary source vector register, vt. The even elements 

selected from vs, namely vs[0] and vs[2] are replicated into the 2 odd elements 
of the destination vector register in reverse order, namely vd[3] and vd[l], 
respectively. The odd elements vt[l] and vt[3] from the vector register vt are 
replicated into the 2 even elements of the destination elements in reverse 
15 order, namely vd[0] and vd[l], respectively. 

Figure lOG illustrates a block diagram of a shuffle operation, which 
replicates the upper 2 elements of 4 16-bit elements from each of two source 
registers into a destination vector register. This shuffle operation, 

20 represented by an exemplary mnemonic REPA.QH, selects the upper 2 

elements of 4 16-bit elements in exemplary source vector registers, vs and vt. 
The upper elements selected from vs, namely vs[2] and vs[3] are replicated 
into the upper elements of the destination vector register, namely vd[2] and 
vd[3], respectively. The upper elements vt[2] and vt[3] from the vector register 

25 vt are replicated into the lower elements of the destination elements vd[0] 
andvd[2], respectively. 



SGM5-4-457.00 



October 7, 1997 



Figure lOH illustrates a block diagram of a shuffle operation, which 
replicates the lower 2 elements of 4 16-bit elements from each of two source 
registers into a destination vector register. This shuffle operation, 
represented by an exemplary mnemonic REPB.QH, selects the lower 2 
elements of 4 16-bit elements in exemplary source vector registers, vs and vt. 
The lower elements selected from vs, namely vs[0] and vs[l] are replicated 
into the upper elements of the destination vector register, namely vd[2] and 
vd[3], respectively. The lower elements vt[0] and vt[l] from the vector register 
vt are replicated into the lower elements of the destination elements vd[0] 
andvd[2], respectively. 

The shuffle instructions allow more efficient SIMD vector operations. 
First, the shuffle operation creates a vector of new data sizes by converting 
between 8-bit elements and 16-bit elements in a vector. These data type 
conversions enable a larger range of computational data format than their 
storage format, such as 16 bit computation on 8 bit numbers. For example, 
these operations allow conversion of a data set from a smaller range format 
to a larger range format or vice versa as between 8 and 16 bit audio or video 
data. 

Second, the shuffle operations are also useful in interleaving and 
deinterleaving data. For example, some applications store multiple channel 
data in separate arrays, or interleaved in a single array. These applications 
typically require interleaving or deinterleaving the multiple channels. In 
these applications, separate R, G, B, A byte arrays may be converted into an 
interleaved RGBA array by the following series of shuffle instructions: 

MIXL.OB RGL, R, G ; RGRGRGRG 
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MIXL.OB BAL, B, A 
MIXH.OBRGH,R,G 
MIXH.OB BAH, B, A 
MIXL.QS RGBALL, RGL, BAL 
MIXH.QS RGBALH, RGL, BAL 



BABABABA 
RGRGRGRG 
BABABABA 
RGBARGBA 
RGBARGBA 



10 



15 



MIXL.QS RGBAHL, RGH, BAH ; RGBARGBA 
MIXH.QS RGBAHH, RGH, BAH; RGBARGBA 

Conversely, an interleaved RGBA array may be deinterleaved into separate R, 

G, B, and A arrays by the following series of shuffle instructions: 

PACL.OB GAOGAl, RGBAO, RGBAl 

PACH.OB RBORBl, RGBAO, RGBAl 

PACL.OB GA2GA3, RGBA2, RGBA3 

PACH.OB RB2RB3, RGBA2, RGBA3 

PACL.OB A0A1A2A3, GAOGAl, GA2GA3 

PACH.OB G0G1G2G3, GAOGAl, GA2GA3 

PACL.OB B0B1B2B3, RBORBl, RB2RB3 

PACH.OB R0R1R2R3, RBORBl, RB2RB3 



20 Third, some algorithms operate on 2 dimensional arrays of data such as 

images. Such an array typically orders the elements of the array in a major 
axis, where the elements are consecutive, and a minor axis, where the 
elements are separated by the size of the major axis. Often, a transpose 
operation is performed on the 2 dimensional array by converting the major 

25 axis to minor axis and vice versa. A common example is a discrete cosine 
transformation (DCT) requiring transposing 8x8 block of array. In this 
example, the 8x8 block of array consists of following elements: 





do 


dl 


d2 


d3 


d4 


d5 


d6 


d7 


sO 


AO 


BO 


CO 


DO 


EO 


FO 


GO 


HO 


si 


Al 


Bl 


CI 


Dl 


El 


Fl 


Gl 


HI 


s2 


A2 


B2 


CI 


D2 


E2 


F2 


G2 


H2 


s3 


A3 


B3 


C3 


D3 


E3 


F3 


G3 


H3 


s4 


A4 


B4 


C4 


D4 


E4 


F4 


G4 


H4 


s5 


A5 


B5 


C5 


D5 


E5 


F5 


G5 


H5 


s6 


A6 


B6 


C6 


D6 


E6 


F6 


G6 


H6 


s7 


A7 


B7 


C7 


D7 


E7 


F7 


G7 


H7 
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The present invention can transpose the 8x8 transpose in OB mode in 



24 instructions, of which 12 are shown as follows: 

MIXH.OB to, sO, si AO Al BO Bl CO CI DO Dl 

5 MIXH.OB tl, s2, s3 A2 A3 B2 B3 C2 C3 D2 D3 

MIXH.OB t2, s4, s5 A4 A5 B4 B5 C4 C5 D4 D5 

MIXH.OB t3, s6, s7 A6 A7 B6 B7 C6 C7 D6 D7 

MIXH.QH uO, to, tl AO Al A2 A3 BO Bl B2 B3 

MIXH.QH ul, t2, t3 A4 A5 A6 A7 B4 B5 B6 B7 

10 MIXH.QH u2, to, tl CO CI C2 C3 DO Dl D2 D3 

MIXH.QH u3, t2, t3 C4 C5 C6 C7 D4 D5 D6 D7 

REPA.QH do, uO, ul AO Al A2 A3 A4 A5 A6 A7 

REPB.QH dl, uO, ul BO Bl B2 B3 B4 B5 B6 B7 

REPA.QH d2, u2, u3 CO CI C2 C3 C4 C5 C6 C7 

15 REPB.QH d3, u2, u3 DO Dl D2 D3 D4 D5 D6 D7 

MIXL.OB to, sO, si EO El FO Fl GO Gl HO HI 

MIXL.OB tl, s2, s3 E2 E3 F2 F3 G2 G3 H2 H3 

MIXL.OB t2, s4, s5 E4 E5 F4 F5 G4 G5 H4 H5 

20 MIXL.OB t3, s6, s7 E6 E7 F6 F7 G6 G7 H6 H7 

MIXL.QH uO, to, tl EO El E2 E3 FO Fl F2 F3 

MIXL.QH ul, t2, t3 E4 E5 E6 E7 F4 F5 F6 F7 

MIXL.QH u2, to, tl G0G1G2G3H0H1H2H3 

MIXL.QH u3, t2, t3 G4 G5 G6 G7 H4 H5 H6 H7 

25 REPA.QH dO, uO, ul EO El E2 E3 E4 E5 E6 E7 

REPB.QH dl, uO, ul FO Fl F2 F3 F4 F5 F6 F7 

REPA.QH d2, u2, u3 GO Gl G2 G3 G4 G5 G6 G7 

REPB.QH d3, u2, u3 HO HI H2 H3 H4 H5 H6 H7 



30 In another example, an exemplary 4x4 array block consists of following 

elements: 





do 


dl 


d2 


d3 


sO 


A 


B 


C 


D 


sl 


E 


F 


G 


H 


s2 


I 


J 


K 


L 


s3 


M 


N 


O 


P 



A transpose operation of the 4x4 array block in QH mode uses 8 shuffle 
instructions as follows: 

40 

MIXH.QH to, so, sl A E B F 

MIXH.QH tl, s2, s3 I M J N 
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REPA.QH dO, to, tl A E I M 

REPB.QH dl, to, tl BFJN 

MIXL.QH to, sO, si CGDH 

5 MIXL.QH tl, s2, s3 KOLP 

REPA.QH d2, tO, tl C G K O 

REPB.QH d3, tO, tl D H L P 

The shuffle instructions such as BFLA and BFLB allow reversing the 

order of elements in an array, in pairs or groups of 4. Larger groups can be 

10 reordered by memory or register address because they are a multiple of 64 bit 
elements. Inverting the order of a large array can be accomplished by 
inverting each vector of 4 elements with BFLB and loading from or storing 
each doubleword to the mirrored address in the array. Similarly, a butterfly 
on a large array can be assembled from double word addressing and BFLA or 

15 BFLB operations on the addressed doublewords. 

The present invention thus provides a method for providing element 
alignment and ordering for SIMD processing. While the present invention 
has been described in particular embodiments, it should be appreciated that 
20 the present invention should not be construed as being limited by such 
embodiments, but rather construed according to the claims below. 
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CLAIMg 

What is claimed is: 

1. In a computer system including a processor having a plurality of 
registers, a method for generating an aligned vector of first width from two 

5 second width vectors for single instruction multiple data (SIMD) processing, 
comprising the steps of: 

loading a first vector from a memory imit into a first register, wherein 
the first vector contains a first byte of an aligned vector to be generated; 

loading a second vector from the memory unit into a second register; 
10 determining a starting byte in the first register wherein the starting byte 

specifies the first byte of an aligned vector; 

extracting a first width vector from the first register and the second 
register beginning from the first bit in the first byte of the first register 
continuing through the bits in the second register; and 
15 replicating the extracted first width vector into a third register such that 

the third register contains a plurality of elements aligned for SIMD 
processing. 

2. The method as recited in Claim 1 further comprising the step of 
20 storing the aligned vector in the third register to the memory unit. 

3. The method as recited in Claim 1, wherein the first width and 
second width are each 64 bits. 

25 4. The method as recited in Claim 3, wherein the third register is 

comprised of 8 8-bit elements. 



SGI-15-4-457.00 



October 7, 1997 



5. The method as recited in Claim 3, wherein the third register is 
comprised of 4 16-bit elements. 

6. The method as recited in Claim 1, wherein the starting byte is 
specified as a constant in an alignment instruction. 

7. The method as recited in Claim 1, wherein the starting byte is 
specified as a variable in a register in an alignment instruction. 

8. The method as recited in Claim 1, wherein the first vector and 
the second vector are in contiguous location in the memory unit. 

9. The method as recited in Claim 1, wherein the processor 
operates in a big-endian byte ordering mode. 

10. The method as recited in Claim 1, wherein the processor 
operates in a little-endian byte ordering mode. 

11. In a computer system including a processor having a plurality of 
registers, a method for generating an ordered set of elements in an N-bit 
vector from two sets of elements in two N-bit vectors for single instruction 
multiple data (SIMD) vector processing, said method comprising the steps of: 

loading a first vector from a memory unit into a first register; 
loading a second vector from the memory unit into a second register; 
selecting a subset of elements from the first register and the second 
register; and 
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replicating the elements from the subset into the elements in the third 
register in a particular order suitable for subsequent SIMD vector processing, 

12. The method as recited in Claim 11 further comprising the step of 
5 storing the elements in the third register to the memory unit. 

13. The method as recited in Claim 11, wherein the first vector and 
the second vector are each comprised of 4 16-bit elements indexed from 0 to 3. 

10 14. The method as recited in Claim 11, wherein the first vector and 

the second vector are each comprised of 8 8-bit elements indexed from 0 to 7. 

15. The method as recited in Claim 13, wherein the subset is 
comprised of two elements from the first register and two elements from the 

15 second register. 

16. The method as recited in Claim 14, wherein the subset is 
comprised of four elements from the first register and four elements from the 
second register. 

20 

17. The method as recited in Claim 13, wherein the subset is 
comprised of the elements 2 and 3 from the first register and the elements 2 
and 3 from the second register. 

25 18. The method as recited in Claim 17, wherein the particular order 

of the elements in the third register comprises: 

the element 0 replicated from the element 2 of the second register; 
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the element 1 replicated from the element 2 of the first register; 

the element 2 replicated from the element 3 of the second register; and 

the element 3 replicated from the element 3 of the first register. 

19. The method as recited in Claim 13, wherein the subset is 
comprised of the elements 0 and 1 from the first register and the elements 0 
and 1 from the second register. 

20. The method as recited in Claim 19, wherein the particular order 
of the elements in the third register comprises: 

the element 0 replicated from the element 0 of the second register; 
the element 1 replicated from the element 0 of the first register; 
the element 2 replicated from the element 1 of the second register; and 
the element 3 replicated from the element 1 of the first register. 

21. The method as recited in Claim 13, wherein the subset is 
comprised of the elements 1 and 3 from the first register and the elements 1 
and 3 from the second register. 

22. The method as recited in Claim 21, wherein the particular order 
of the elements in the third register comprises: 

the element 0 replicated from the element 1 of the second register; 
the element 1 replicated from the element 3 of the second register; 
the element 2 replicated from the element 1 of the first register; and 
the element 3 replicated from the element 3 of the first register. 
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23. The method as recited in Claim 13, wherein the subset is 
comprised of the elements 0 and 2 from the first register and the elements 0 
and 2 from the second register. 

24. The method as recited in Claim 23, wherein the particular order 
of the elements in the third register comprises: 

the element 0 replicated from the element 0 of the second register; 
the element 1 replicated from the element 2 of the second register; 
the element 2 replicated from the element 0 of the first register; and 
the element 3 replicated from the element 2 of the first register. 

25. The method as recited in Claim 13, wherein the subset is 
comprised of the elements 0 and 2 from the first register and the elements 1 
and 3 from the second register. 

26. The method as recited in Claim 25, wherein the particular order 
of the elements in the third register comprises: 

the element 0 replicated from the element 1 of the second register; 
the element 1 replicated from the element 0 of the first register; 
the element 2 replicated from the element 3 of the second register; and 
the element 3 replicated from the element 2 of the first register. 

27. The method as recited in Claim 13, wherein the subset is 
comprised of the elements 0 and 2 from the first register and the elements 1 
and 3 from the second register. 
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28, The method as recited in Claim 11, wherein the particular order 
of the elements in the third register comprises: 

the element 0 replicated from the element 3 of the second register; 
the element 1 replicated from the element 2 of the first register; 
the element 2 replicated from the element 1 of the second register; and 
the element 3 replicated from the element 0 of the first register. 

29, The method as recited in Claim 13, wherein the subset is 
comprised of the elements 2 and 3 from the first register and the elements 2 
and 3 from the second register. 

30, The method as recited in Claim 29, wherein particular order of 
the elements in the third register comprises: 

the element 0 replicated from the element 2 of the second register; 
the element 1 replicated from the element 3 of the second register; 
the element 2 replicated from the element 2 of the first register; and 
the element 3 replicated from the element 3 of the first register. 

31, The method as recited in Claim 13, wherein the subset is 
comprised of the elements 0 and 2 from the first register and the elements 0 
and 1 from the second register. 

32, The method as recited in Claim 31, wherein the particular order 
of the elements in the third register comprises: 

the element 0 replicated from the element 0 of the second register; 
the element 1 replicated from the element 1 of the second register; 
the element 2 replicated from the element 0 of the first register; and 
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the element 3 replicated from the element 2 of the first register. 

33. The method as recited in Claim 14, wherein the subset is 
comprised of the elements 1, 3, 5, and 7 from the first register and the 
elements 1, 3, 5, and 7 from the second register. 

34. The method as recited in Claim 33, wherein the particular order 
of the elements in the third register comprises: 

the element 0 replicated from the element 1 of the second register; 
the element 1 replicated from the element 3 of the second register; 
the element 2 replicated from the element 5 of the second register; 
the element 3 replicated from the element 7 of the second register; . 
the element 4 replicated from the element 1 of the first register; 
the element 5 replicated from the element 3 of the first register; 
the element 6 replicated from the element 5 of the first register; and 
the element 7 replicated from the element 7 of the first register. 

35. The method as recited in Claim 14, wherein the subset is 
comprised of the elements 0, 2, 4, and 6 from the first register and the 
elements 0, 2, 4, and 6 from the second register. 

36. The method as recited in Claim 35, wherein the particular order 
of the elements in the third register comprises: 

the element 0 replicated from the element 0 of the second register; 
the element 1 replicated from the element 2 of the second register; 
the element 2 replicated from the element 4 of the second register; 
the element 3 replicated from the element 6 of the second register; 



SGM5-4-457.00 



37 



Ckrtober 7, 1997 



the element 4 replicated from the element 0 of the first register; 
the element 5 replicated from the element 2 of the first register; 
the element 6 replicated from the element 4 of the first register; and 
the element 7 replicated from the element 6 of the first register. 

37. The method as recited in Claim 14, wherein the subset is 
comprised of the elements 4, 5, 6, and 7 from the first register and the 
elements 4, 5, 6, and 7 from the second register. 

38. The method as recited in Claim 37, wherein the particular order 
of the elements in the third register comprises: 

the element 0 replicated from the element 4 of the second register; 
the element 1 replicated from the element 4 of the first register; 
the element 2 replicated from the element 5 of the second register; 
the element 3 replicated from the element 5 of the first register; 
the element 4 replicated from the element 6 of the second register; 
the element 5 replicated from the element 6 of the first register; 
the element 6 replicated from the element 7 of the second register; and 
the element 7 replicated from the element 7 of the first register. 

39. The method as recited in Claim 14, wherein the subset is 
comprised of the elements 0, 1, 2, and 3 from the first register and the 
elements 0, 1, 2, and 3 from the second register. 

40. The method as recited in Claim 39, wherein the particular order 
of the elements in the third register comprises: 

the element 0 replicated from the element 0 of the second register; 
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the element 1 replicated from the element 0 of the first register; 
the element 2 replicated from the element 1 of the second register; 
the element 3 replicated from the element 1 of the first register; 
the element 4 replicated from the element 2 of the second register; 
5 the element 5 replicated from the element 2 of the first register; 

the element 6 replicated from the element 3 of the second register; and 
the element 7 replicated from the element 3 of the first register. 

41. The method as recited in Claim 14, wherein the subset is 
10 comprised of the elements 4, 5, 6, and 7 from the first register. 

42. The method as recited in Claim 41, wherein the particular order 
of the elements in the third register comprises: 

the element 0 replicated from the element 4 of the first register; 
15 the element 2 replicated from the element 5 of the first register; 

the element 4 replicated from the element 6 of the first register; 
the element 6 replicated from the element 7 of the first register; and 
the elements 1, 3, 5, and 7 containing a zero in all the bits. 

20 43. The method as recited in Claim 14, wherein the subset is 

comprised of the elements 0, 1, 2, and 3 from the first register, 

44. The method as recited in Claim 43, wherein the particular order 
of the elements in the third register comprises: 
25 the element 0 replicated from the element 0 of the first register; 

the element 2 replicated from the element 1 of the first register; 
the element 4 replicated from the element 2 of the first register; 
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the element 6 replicated from the element 3 of the first register; and 
the elements 1, 3, 5, and 7 containing a zero in all the bits. 

45- The method as recited in Claim 14, wherein the subset is 
5 comprised of the elements 4, 5, 6, and 7 from the first register. 

46. The method as recited in Claim 45, wherein the particular order 
of the elements in the third register comprises: 

the element 0 replicated from the element 4 of the first register; 
10 the element 1 replicating the sign bit of the element 4 of the first 

register in all the bits; 

the element 2 replicated from the element 5 of the first register; 
the element 3 replicating the sign bit of the element 5 of the first 
register in all the bits; 
15 the element 4 replicated from the element 6 of the first register; 

the element 5 containing the sign bit of the element 6 of the first 
register in all the bits; 

the element 6 replicated from the element 7 of the first register; and 
the element 7 containing the sign bit of the element 7 of the first 
20 register in all the bits. 

47, The method as recited in Claim 14, wherein the subset is 
comprised of the elements 0, 1, 2, and 3 from the first register. 

25 48. The method as recited in Claim 47, wherein the particular order 

of the elements in the third register comprises: 

the element 0 replicated from the element 0 of the first register; 
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the element 1 containing the sign bits of the element 0 of the first 
register; 

the element 2 replicated from the element 1 of the first register; 
the element 3 containing the sign bits of the element 1 of the first 
register; 

the element 4 replicated from the element 2 of the first register; 
the element 5 containing the sign bits of the element 2 of the first 
register; 

the element 6 replicated from the element 3 of the first register; and 
the element 7 containing the sign bits of the element 3 of the first 
register. 
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ABSTRACT 

The present invention provides alignment and ordering of vector 
elements for SIMD processing. In the alignment of vector elements for SIMD 
processing, one vector is loaded from a memory unit into a first register and 
5 another vector is loaded from the memory unit into a second register. The 
first vector contains a first byte of an aligned vector to be generated. Then, a 
starting byte specifying the first byte of an aligned vector is determined. Next, 
a vector is extracted from the first register and the second register beginning 
from the first bit in the first byte of the first register continuing through the 

10 bits in the second register. Finally, the extracted vector is replicated into a 
third register such that the third register contains a plurality of elements 
aligned for SIMD processing. In the ordering of vector elements for SIMD 
processing, a first vector is loaded from a memory unit into a first register and 
a second vector is loaded from the memory unit into a second register. Then, 

15 a subset of elements are selected from the first register and the second register. 
The elements from the subset are then replicated into the elements in the 
third register in a particular order suitable for subsequent SIMD vector 
processing. 
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IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 



In re application of: 



Van Hook et al 



Art Unit: To be assigned 
Examiner: To be assigned 
Atty. Docket: 1778.0100002 



Appl. No.: To be assigned 



Filed 



Herewith 



(0055.20US) 



For: 



Alignment and Ordering of 
Vector Elements for Single 
Instruction Multiple Data 
Processing 



Preliminary Amendment 



Commissioner for Patents 
Washington, D.C, 20231 

Sir: 

Prior to examination of the abo ve-captioned application, Applicant submits the following 
Amendments and Remarks. 

In the Specification: 

Page 1, between lines 3 and 4, please insert the following: 

—This application is a continuation of U.S. Patent Application No, 09/263,798, filed 
March 5, 1 999, which is a continuation of U.S. Patent Application No. 08/947,649, filed October 
9, 1997, now U.S. Patent No. 5,933,650, issued August 3, 1999.-- 

In the Claims: 

Please cancel claims 2-48 without prejudice or disclaimer. 
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Appl. No.: To be assigned 



Remarks 



Upon entry of the foregoing amendments, claim 1 is pending in the apphcation. Claims 
2-48 are sought to be canceled without prejudice or disclaimer. The above amendments are to 
matters of form only and their entry is respectfully requested. These changes are believed to 
introduce no new matter, and their entry is respectfully requested. 

The Examiner is invited to telephone the undersigned representative if he believes that 
an interview might be useful for any reason. 



Respectfully submitted, 



Sterne, Kessler, Goldstein & Fox p.l.l.c. 




Michael B. Ray 
Attorney for Applicant 
Registration No. 33,997 



Date: 




1 100 New York Avenue, N.W. 
Suite 600 

Washington, D.C. 20005-3934 
(202) 371-2600 
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IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 



In re application of: 

Timothy J. Van Hook et al 
AppLNo. 09/662,832 
Filed: September 15, 2000 



Confirmation No. : (to be assigned) 



Art Unit: 2183 



Examiner: (to be assigned) 

Atty. Docket: 1778.0100002 
(0055.20US) 



For: 



Alignment and Ordering of 
Vector Elements For Single 
Instruction Multiple Data 
Processing 



Second Preliminary Amendment 



Commissioner for Patents 
Washington, D.C. 20231 

Sir: 

Prior to examination of the above-captioned application. Applicants submit the 
following Amendment and Remarks. This Amendment is provided in the following 
format: 

(A) A clean version of each replacement paragraph/section/claim along 
with clear instructions for entry; 

(B) Starting on a separate page, appropriate remarks and arguments. 37 
C.F.R. § 1.1 1 1 and MPEP 714; and 

(C) Starting on a separate page, a marked-up version entitled: " Version 
with markings to show changes made. " 

It is not believed that extensions of time or fees for net addition of claims are 
required beyond those that may otherwise be provided for in documents accompanying 
this paper. However, if additional extensions of time are necessary to prevent 
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abandonment of this application, then such extensions of time are hereby petitioned 
under 37 C.F.R. § LI 36(a), and any fees required therefor (including fees for net 
addition of claims) are hereby authorized to be charged to our Deposit Account No. 



In the Claims: 

Please add the following new claims: 

49. (New) A method for generating an aligned vector from two source vectors for 
single instruction multiple data (SIMD) processing, comprising the steps of: 

(1) loading a first source vector into a first register; 

(2) loading a second source vector into a second register; 

(3) reading a first plurality of elements from said first register and a second 
plurality of elements from said second register; and 

(4) writing said first plurality of elements and said second plurality of 
elements into a third register in a particular order to produce a target vector having a 
plurality of elements aligned for SIMD processing. 
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50. (New) The method as recited in claim 49, wherein said writing step 
comprises: 

writing even-numbered, lower elements of said first register to said third register; 

and 

writing sign bits of odd-numbered, lower elements of said first register to said 
third register. 

51. (New) The method as recited in claim 49, wherein said writing step 
comprises: 

writing even-numbered, upper elements of said first register to said third register; 

and 

writing sign bits of odd-numbered, upper elements of said first register to said 
third register. 

52. (New) A method for generating an ordered set of elements in a target vector 
from elements in a first source vector and a second source vector for single instruction 
multiple data (SIMD) vector processing, comprising the steps of: 

(1) loading the first source vector into a first register; 

(2) loading the second source vector into a second register; 

(3) selecting a first subset of elements from said first register, said first subset 
comprising any one of the following groups of elements from the first source vector: odd 
elements, even elements, lower elements and upper elements; and 
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(4) selecting a second subset of elements from said second register, said 
second subset comprising any one of the following groups of elements from the second 
source vector: odd elements, even elements, lov/er elements and upper elements. 

53. (New) The method of claim 52, further comprising the step of: 

(5) writing said first and said second subset of elements into a third register to 
facilitate a particular SIMD vector processing operation, said first subset being written 
into any one of the following groups of elements in said third register: upper elements, 
odd elements, and odd elements in reverse order, and said second subset being written 
into any one of the following groups of elements in said third register: lower elements, 
even elements, and even elements in reverse order, wherein elements written into said 
third register comprise the target vector. 
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Remarks 



Upon entry of the foregoing amendment, claims 1 and 49-53 are pending in the 
application, with 1, 49 and 52 being the independent claims. New claims 49-53 are 
sought to be added. These changes are believed to introduce no new matter, and their 
entry is respectfully requested. 

Consideration of all pending claims is respectfully solicited. The Examiner is 
invited to telephone the undersigned representative if he believes that an interview might 
be useful for any reason. 



Respectfully submitted. 



Sterne, Kessler, Goldstein & Fox p.l.l.c. 




Michael B. Ray 
Attomey for Applicants 
Registration No. 33,997 




1 100 New York Avenue, N.W., Suite 600 
Washington, D.C. 20005-3934 
(202) 371-2600 
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IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 



In re application of: 



Confirmation No.: 2552 



Timothy J. Van Hook et al 
Appl. No. 09/662,832 
Filed: September, 15, 2000 
For: Alignment and Ordering of 



Examiner: Pan, D. 

Atty. Docket: 1778.0100002 (0055.20US) 



Art Unit: 2183 



Vector Elements for Single 
Instruction Multiple Data 
Processing 



Amendment And Reply Under 37 C.F.R. § 1.111 



Commissioner for Patents 
Washington, D.C. 20231 

Sir: 

In reply to the Office Action dated August 24, 2001, (PTO Prosecution File Wrapper 
Paper No. 8), Applicants submit the following Amendment and Remarks. This Amendment 
is provided in the following format: 

(A) A clean version of each replacement paragraph/section/claim along with 
clear instructions for entry; 

(B) Starting on a separate page, appropriate remarks and arguments. 37 
C.F.R. § 1.111 andMPEP714;and 

(C) Starting on a separate page, a marked-up version entitled: " Version with 
markings to show changes made. " 

It is not believed that extensions of time or fees for net addition of claims are 
required beyond those that may otherwise be provided for in documents accompanying this 
paper. However, if additional extensions of time are necessary to prevent abandonment of 
this application, then such extensions of time are hereby petitioned under 37 C.F.R. 
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§ 1.136(a), and any fees required therefor (including fees for net addition of claims) are 
hereby authorized to be charged to our Deposit Account No. 19-0036. 

Amendments 

In the Claims: 

Please substitute the following claim 1 for the pending claim 1 : 

1 . (once amended) In a computer system including a processor having a plurality of 
registers, a method for generating an aligned vector of first width fi-om two second width 
vectors for single instruction multiple data (SIMD) processing, comprising the steps of; 

loading a first vector from a memory vmit into a first register, wherein the 
first vector contains a first byte of the aligned vector to be generated; 

loading a second vector fi-om the memory unit into a second register; 

determining a starting byte in the first register wherein the starting byte 
specifies the first byte of the aligned vector; 

extracting the aligned vector from the first register and the second register 
beginning fi-om the first bit in the starting byte of the first register continuing through bits 
in the second register; and 

replicating the aligned vector into a third register such that the third register 
contains a plurality of elements aligned for SIMD processing. 
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Remarks 

Reconsideration of this Application is respectfully requested. 

Upon entry of the foregoing amendment, claims 1 and 49-53 are pending in the 
application, with claims 1, 49, and 52 being the independent claims. Claim 1 is sought to 
be amended. No new matter is embraced by this amendment and its entry is respectfully 
solicited. Based on this amendment and the remarks set forth below, it is respectfully 
requested that the Examiner reconsider and withdraw all outstanding objections and 
rejections. 

The Applicant would like to thank the Examiner for the interview on January 1 0, 
2002. During that interview the differences between the claimed invention and U.S. Patent 
No. 5,887,183 to Agarwal et al were discussed. The Examiner acknowledged the 
differences and indicated that he would look into the case in the next response. 

Description of the Invention 

The present invention is directed to a method of processing instructions for Single 
Instruction/Multiple Data (hereinafter, "SIMD") processing. The method includes steps of 
aligning and ordering vector elements for SIMD processing. To align the vectors for SIMD 
processing, a first vector is loaded into a first register, and a second vector is loaded into a 
second register. A first width vector is extracted from the first register and the second 
register. The resulting extracted vector starts from a first bit of a starting byte in the first 
width vector and continues through bits of the second width vector. The extracted vector 
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is then replicated into a third register. To order elements for SIMD processing, a first subset 
of elements (which may include odd bytes, even bytes, upper bytes, lower bytes or any 
combination of these) is chosen from the first register and a second subset of elements 
(which may include odd bytes, even bytes, upper bytes, lower bytes or any combination of 
these) is chosen from the second register. The chosen elements are replicated into the third 
register. 

An advantage of the present invention is that it allows for faster parallel processing 
of instructions to be used for SIMD processing. Furthermore, the present invention allows 
processing, for example, of 64-bit vectors containing different size elements (e.g., a 64-bit 
vector may contain eight 8-bit elements, four 16-bit elements, two 32-bit elements or one 
64-bit element). The vectors are processed according to the element subdivision within each 
vector register. 

Rejections under 35 U.S.C. § 112 

In the Office Action dated August 24, 2001, the Examiner has rejected claim 1 under 
35 use § 1 12, TI 2, as being indefinite for failing particularly point out and distinctly claim 
the subject matter which the Applicants' regard as their invention. Claim 1 is sought to be 
amended to correct this minor antecedent basis error. In view of this amendment, 
Applicants respectfiilly request that this rejection be reconsidered and withdrawn. 
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Rejections under 35 U.S.C. § 103 

In the Office Action dated August 24, 2001, the Examiner has rejected claims 1, 49, 
and 52 under 35 USC § 103(a) as being unpatentable over U.S. Patent No. 5,887,183 to 
Agarwal et al in view of U.S. Patent No. 5,922,066 to Cho et al . 

With respect to claim 1, the Examiner states that all of the steps of claim 1 except 
the last the step of repUcating of the first width vector into a third register are described by 
the Agarwal reference and the step of replicating is described by the Cho reference. With 
respect to claim 49, the Examiner gives a similar rejection under 35 USC § 103(a) and states 
that even though the Agarwal reference does not specifically describe the writing step of 
claim 49, the Cho reference provides such step. With respect to claim 52, the Examiner 
states that the Agarwal reference does not specifically describe the step of selecting the first 
subset of elements and the step of selecting the second subset of elements, however, the 
Examiner states that Cho reference provides such steps. The Examiner states that it would 
have been obvious to one of ordinary skill in the art to combine the two references to 
produce the claimed subject matter of claim 52. 

The Examiner's rejections are respectfiiUy traversed for the reasons set forth below, 
and the Examiner is respectfiiUy requested to reconsider and withdraw any objections or 
rejections with respect to claims 1, 49, and 52. 

Agarwal describes a method and a system for loading and storing vectors (having a 
plurality of elements) in a plurality of modes. The vectors are stored in an input storage 
area, then the elements of the vectors are transferred fi-om the input storage area into a vector 
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register interface unit. From the vector interface unit, the vector elements are transferred to 
addressable locations in a preselected pattern in the output storage area. See Agarwal at col. 
9, lines 66-67; col. 10, line 1-5; col. 10, lines 46-53; and FIGS. 4A and 4B. The vector 
elements may be stored in a designated pattern (where such pattern has alternating real and 
imaginary elements) in the input storage area. Furthermore, the vectors elements may be 
separated into real and imaginary elements in the output storage area. 

Cho describes a system and a method for aligning data for load and/or store 
instructions. The system of Cho is capable of rotating and shifting data for arithmetic logic 
instructions. Cho describes an aligner that is adapted to align data as required for load/store 
instructions and to shift data elements within an operand as required for a shift instruction, 
Cho uses the same circuit to perform two operations, thus, reducing an overall circuit size. 
Cho provides for a SIMD processor including an instruction fetch unit, dual instruction 
decoders, a vector register file, a scalar register file and dual execution units capable of 
operating in parallel. The instruction decoder of Cho decodes the instructions, which the 
execution units execute. Such instructions may be to add, subtract, divide or multiply source 
operands. 

The system of Cho is capable of loading/storing data and shifting the data within an 
operand (which requires a logical or arithmetic operation, left or right shift, rotate or change 
of position of the elements in the operand). The instruction fetch unit fetches up to two 
instructions for each of the decoders per each cycle to be processed. The decoders process 
and pass each instruction onto either the vector or scalar register file. The vector register file 
contains 64 32-byte vector registers that are organized into two banks of 32 vector registers. 
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The scalar register file contains 32 32-bit scalar register files, where each register file 
contains a single 8-bit, 16-bit, or 32-bit value. After processing, the results are forwarded 
into execution units containing aligners. The execution units perform such functions as 
load, store and data move operations. The aligner is used in the load and store operations 
performed by the execution units. For example, for an unaligned load operation, the 
execution unit requests two data vectors from the memory system, which are two 32-byte 
data vectors, and a resultant vector is constructed from the two. For an unaligned store, the 
aligner rotates the data elements in the data storage so that the elements are in their correct 
positions for storage in two cache lines in the memory system. The aligner has an input 
select circuit, an element select circuit and a rotation circuit. For data element shifting or 
rotating, the input select circuit selects one vector operand and another operand, which may 
be a vector or a scalar, and shifts them into a resultant vector, when the first vector operand 
is shifted. The shifting and rotating are performed by the aligner include shifting elements 
by the N byte size elements, where N is an integer representing the number of elements to 
be shifted or rotated See Cho at col. 6, lines 18-39. 

Claim 1 



Claim 1 recites the steps of: 

loading a first vector from a memory unit into a first register, 
wherein the first vector contains a first byte of the aligned vector to be 
generated; 

loading a second vector from the memory unit into a second register; 
determining a starting byte in the first register wherein the starting 
byte specifies the first byte of the aligned vector; 
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extracting the aligned vector from the first register and the second 
register beginning from the first bit in the starting byte of the first register 
continuing through bits in the second register; and 

replicating the aligned vector into a third register such that the third 
register contains a plurality of elements aligned for SIMD processing. 

As indicated above, the Examiner has acknowledged that Agarwal does not teach the 

replicating step. In addition, Agarwal does not teach the steps of loading a second vector into 

a second register, determining a starting byte and extracting an aligned vector. 

With respect to the loading step, Agarwal loads only a first vector (i.e., from vector 
register interface unit 216) into one or two registers (i.e., in vector register array 238). 
Further, Agarwal does not teach or suggest the claimed step of determining a starting byte 
in a first register. Instead Agarwal, stores or loads the vectors consecutively as they appear 
either in the input storage area or load area. The vectors may be stored in different pattems, 
such as reverse order or separated into real and imaginary components. However, Agarwal 
does not determine which element is the starting byte of the first vector register. 

Agarwal also does not teach or suggest the claimed step of extracting the aligned 
vector from first and second registers beginning with the starting byte. Agarwal merely 
stores or loads vector register elements to and from addressable locations in preselected 
pattems. For example, in Agarwal, vector elements are transferred from the vector register 
interface unit 216 to a vector register in vector register array 238. See FIG. 4 A. However, 
vectors are not taken from first and second registers and no alignment is taking place in the 
step of writing a vector to a vector register in array 238. Thus, in Agarwal, there is no 
teaching or suggestion of a step of extracting. 
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Cho does not provide the missing features. For example, Cho does not teach or 
suggest at least the steps of determining, and extracting. What Cho describes is a method 
of shifting and rotating of elements within a vector register, where a shifting operation is 
performed by shifting an entire first vector operand into another vector register and shifting 
other elements into that vector register. See Cho at col. 6, lines 18-39. This is different 
from the present invention, as recited in claim 1 , where elements in each vector register are 
selected by extracting a vector from the first vector register and the second vector register. 

Therefore, neither Agarwal nor Cho teaches or suggests every element of claim 1 . 
Furthermore, the combination of Agarwal and Cho does not provide any teaching or 
suggestion that would lead a person of ordinary skill in the art to create the combination of 
steps recited in claim 1 . While AgarwEil discloses a system of loading or storing vectors into 
registers, the vectors are already pre-aligned before they are stored or loaded. The system 
in Agarwal determines what type of vectors (stride- 1, complex, stride-n or stride-(-l)) are 
to be loaded/stored and loads/stores them accordingly. Cho merely describes a system of 
data processing that is capable of shifting and rotating data to be used for load/store 
functions. The combination of Agarwal and Cho would produce a system capable of 
performing load and store functions of vectors and performing shift operations to align the 
vectors from load and store operations. However, the combination would not produce the 
claimed method including the steps of determining a starting byte, extracting an aligned 
vector and replicating the aligned vector into a third register for further SIMD processing. 
Accordingly, Applicants submit that the rejection of claim 1 under 35 U.S.C, §103 is 
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improper. Based on these remarks, it is respectfully requested that the Examiner reconsider 
and withdraw the rejection of claim 1 . 



Claim 49 recites the steps of loading a first source vector into a first register, loading 
a second source vector into a second register, reading a first plurality of elements from the 
first register and a second plurality of elements fi*om the second register and writing the first 
and second plurality of elements into a third register. As discussed above with respect to 
claim 1, neither Agarwal or Cho teaches or suggests these claimed features. Accordingly, 
claim 49 is patentable over the combination of Agarwal and Cho for the same reasons 
discussed above with respect to claim 1 . Reconsideration and withdrawal of the rejection 
of claim 49 is thus respectfiilly requested. 



Claim 52 recites the following steps of loading first and second source vectors into 
first and second registers, respectively, selecting first subset firom the first register (where 
the subset is selected from odd, even, lower and upper elements) and selecting a second 
subset from the second register (where the subset is selected from: odd, even, lower and 
upper elements). Agarwal does not teach or suggest selecting the subsets from each 
respective register. As discussed above with respect to claim 1 , Agarwal describes loading 
and storing elements in vector registers as they originally appear. The elements might be 



Claim 49 



Claim 52 
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subdivided into real and imaginary elements or might be stored in reverse order in resulting 
vectors. However, claim 52 recites steps of selection of elements from first and second 
registers to be placed into the third register. These steps are neither taught nor suggested by 
Agarwal. 

As discussed above with respect to claim 1 , neither Agarwal or Cho teaches or 
suggests these claimed features. Accordingly, claim 52 is patentable over the combination 
of Agarwal and Cho for the same reasons discussed above with respect to claim 1 . 
Reconsideration and withdrawal of the rejection of claim 52 is thus respectfully requested. 

Other Matters 

The Examiner has rejected the claims 1, 49, and 52 under the doctrine of non- 
statutory double patenting. To accommodate this rejection. Applicant's are submitting 
herewith a properly executed terminal disclaimer in accordance with 37 CFR § 1.321(c). 
In view of this terminal disclaimer, reconsideration and withdrawal of this rejection is 
respectfully requested. 



All of the stated grounds of objection and rejection have been properly traversed, 
accommodated, or rendered moot. Applicant(s) therefore respectfully request(s) that the 
Examiner reconsider all presently outstanding objections and rejections and that they be 
withdrawn. Applicant(s) believe that a full and complete reply has been made to the 



Conclusion 
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outstanding Office Action and, as such, the present appKcation is in condition for allowance. 
If the Examiner believes, for any reason, that personal communication will expedite 
prosecution of this application, the Examiner is invited to telephone the undersigned at the 
number provided. 



Date: 



1 100 New York Avenue, N.W. 
Suite 600 

Washington, D.C. 20005-3934 
(202) 371-2600 



Respectfully submitted, 

Sterne, Kessler, Goldstein & Fox p.l.l.c. 

lichael B. Ray 
Attomey for Applicants 
Registration No. 33,997 



PI 08-05, doc 
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Version with markings to show changes made 

1 . (once amended) In a computer system including a processor having a plurality of 
registers, a method for generating an aligned vector of first width from two second width 
vectors for single instruction multiple data (SIMD) processing, comprising the steps of: 

loading a first vector from a memory unit into a first register, wherein the 
first vector contains a first byte of [an] tiie aligned vector to be generated; 

loading a second vector from the memory unit into a second register; 

determining a starting byte in the first register wherein the starting byte 
specifies the first byte of [an] the aligned vector; 

extracting [a first width] the aligned vector from the first register and the 
second register beginning from the first bit in the [first] starting byte of the first register 
continuing through [the] bits in the second register; and 

replicating the [extracted first width] aligned vector into a third register such 
that the third register contains a plurality of elements aligned for SIMD processing. 
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In re application of: 
Timothy J. van Hook et al, 
Appl. No. 09/662,832 
Filed: September 15, 2000 

For: Alignment and Ordering of 
Vector Elements for Single 
Instruction Multiple Data 
Processing 



Confirmation No.: 2552 

Art Unit: 2183 

Examiner: Pan, D. 

Atty. Docket: 1778.0100002 
(0055.20US) 



Preliminary Amendment Under 37 C.F.R. § 1-114 



Commissioner for Patents 
Washington, D.C. 20231 

Sir: 

This Preliminary Amendment is being filed along with a Request for Continued 
Examination Under 37 C.F.R. §1.114. As payment of the issue fee has not yet been made. 
Applicants respectfully submit that filing under 37 C.F.R. § 1,1 14 is proper. 

It is not believed that extensions of time or fees for net addition of claims are 
required beyond those that may otherwise be provided for in documents accompanying this 
paper. However, if additional extensions of time are necessary to prevent abandonment of 
this application, then such extensions of time are hereby petitioned under 37 C.F.R. 
§ 1.136(a), and any fees required therefor (including fees for net addition of claims) are 
hereby authorized to be charged to our Deposit Account No. 19-0036. 

This Amendment is provided in the foUov^ng format: 

(A) A clean version of each replacement paragraph/section/claim along with 

clear instructions for entry; 
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(B) Starting on a separate page, appropriate remarks and arguments. 37 
CF.R. § 1.111 andMPEP714; and 

(C) Starting on a separate page, a marked-up version entitled: " Version with 
markings to show changes made. " 

Amendments 

In the Claims: 

Please add the following new claims: 

54. (new) The method as recited in claim 1 , wherein the first width and second width 
are each 64 bits. 

55. (new) The method as recited in claim 54, wherein the third register is comprised 
of eight 8-bit elements. 

56. (new) The method as recited in claim 54, wherein the third register is comprised 
of four 16-bit elements. 

57. (new) The method as recited in claim 1, wherein the starting byte is specified as 
a variable in a register in an aligimient instruction. 

58. (new) The method as recited in claim 1, wherein the first vector and the second 
vector are in contiguous locations in the memory unit. 
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59. (new) The method as recited in claim 1 , wherein the processor operates in a big- 
endian byte ordering mode. 

60. (new) The method as recited in claim 1 , wherein the processor operates in a little- 
endian byte ordering mode. 

61 .(new) The method as recited in claim 1 , wherein the first vector and the second 
vector are each composed of eight 8-bit elements indexed from 0 to 7, and wherein said 
extracting step comprises: 

extracting elements 4, 5, 6, and 7 from the first register. 

62 . (new) The method as recited in claim 6 1 , wherein said replicating step comprises : 
replicating an element 0 of the third register from an element 4 of the first register; 
replicating, for all bits of an element 1 of the third register, a sign bit of the element 

4 of the first register; 

replicating an element 2 of the third register from an element 5 of the first register; 
replicating, for all bits of an element 3 of the third register, a sign bit of the element 

5 of the first register; 

replicating an element 4 of the third register from an element 6 of the first register; 
replicating, for all bits of an element 5 of the third register, a sign bit of the element 

6 of the first register; 

replicating an element 6 of the third register from an element 7 of the first register; 

and 
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replicating, for all bits of an element 7 of the third register, a sign bit of the element 
7 of the first register. 

63. (new) The method as recited in claim 1, wherein the first vector and the second 
vector are each composed of eight 8-bit elements indexed from 0 to 7, and wherein said 
extracting step comprises: 

extracting elements 0, 1 , 2 and 3 from the first register. 

64. (new) The method as recited in claim 63, wherein said replicating step 
comprises: 

replicating an element 0 of the third register firom an element 0 of the first register; 
replicating, for all bits of an element 1 of the third register, a sign bit of the element 

0 of the first register; 

replicating an element 2 of the third register fi-om an element 1 of the first register; 
replicating, for all bits of an element 3 of the third register, a sign bit of the element 

1 of the first register; 

replicating an element 4 of the third register firom an element 2 of the first register; 
replicating, for all bits of an element 5 of the third register, a sign bit of the element 

2 of the first register; 

replicating an element 6 of the third register fi-om an element 3 of the first register; 

and 

replicating, for all bits of an element 7 of the third register, a sign bit of the element 

3 of the first register. 
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Remarks 



Upon entry of the foregoing amendment, claims 1 and 49-64 are pending in the 
application. Claims 1 and 49-53 have been allowed. By the foregoing, claims 54-64 have 
been added. These changes are believed to introduce no new matter, and their entry is 
respectfully requested. 

Applicants note for the benefit of the Examiner that the terms "first width vector" 
and "aligned vector" are used synonymously herein. Claim 1 as originally filed in this 
application recited at line 10 the term "first width vector." Applicants amended claim 1 in 
a Reply filed on January 24, 2001, to replace the term "first width vector" with the term 
"aligned vector." This amendment was made so that the same term used in other places of 
claim 1 (i.e., see the term "aligned vector" used in claim 1 at lines 2, 6 and 9) was also used 
in line 10. For purposes of clarity, however, Applicants note that the terms are used herein 
synonymously. Thus, the amendment is not intended to narrow the scope of claim 1 . 

This interpretation of the terms "first width vector" and "aligned vector" is fully 
supported by claim 1 as originally filed and by the specification. See, for example, page 10, 
lines 7-10, which states: "The alignment unit 312 receives two vectors fi-om two soxirce 
registers such as vs 306 and vt 308. Then, the alignment unit 3 1 2 extracts an aligned vector 
from the two vectors and stores it into a destination register such as vd 310." Note that the 
term "aligned vector" in this text from the specification has the same meaning as the term 
"first width vector" from original claim 1 . 
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Favorable consideration of all pending claims is respectfully requested. The 
Examiner is invited to telephone the undersigned representative with any questions or 
comments or if he believes that an interview might be useful for any reason. 
Reconsideration of this application and entry of the above Amendments are respectfully 
requested. 



Respectfully submitted, 

Sterne, Kessler, Goldstein & Fox p.l.l.c. 




Michael B. Ray 
Attorney for Applicants 
Registration No. 33,997 



Date: 




1 100 New York Avenue, N.W. 
Suite 600 

Washington, D.C. 20005-3934 
(202) 371-2600 
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In the Claims: 

Please add new claims 54-64. 
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Re: U.S. Patent Nos. 5,933,650 & 6,266,758; Issued: August 3, 1999 & July 24, 2001 
U.S. Patent Application No. 09/662,832; Filed: September 15, 2000 
For: Alignment and Ordering of Vector Elements for Single Instruction 
Multiple Data Processing 

Inventors: Van Hook et al 

Our Refs: 1778.0100000, 1778.0100001, and 1778.0100002 



Dear Mr. Huffinan: 

Our law firm is handling a family of patents on which you were named as an inventor. 
This patent family specifically includes the following: 

• U.S. Patent No. 5,933,650, issued August 3, 1999; 

• U.S. Patent No. 6,266,758, issued July 24, 2001 ; and 

• U.S. Patent Application No. 09/662,832, filed September 15, 2000. 

Each of these documents is entitled "Alignment and Ordering of Vector Elements for Single 
Instruction Multiple Data (SIMD) Processing." All are based on the same patent specification. 
The initially filed patent application was owned by Silicon Graphics, Inc. However, the patent 
family is now owned by MIPS Technologies, Inc., our cUent. 

When the initial patent application was filed, the law firm that handled the filing 
requested status under 37 C.F.R. § 1.47. However, the request did not adequately meet the 
requirements set forth in this Rule. Rule 1 .47, if followed properly, can allow an appUcation to 
be filed even though one or more of the inventors refiises to sign the Declaration or cannot be 
reached. Because the inadequate request for Rule 1.47 Status in the initial application was not 
discovered until recently, it has been carried through to the subsequently filed applications of this 
patent family. 
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William A. Huffman 
September 22, 2004 
Page 2 

We recently brought this error to the attention of the U.S. Patent and Trademark Office 
(USPTO) and asked for direction in correcting it. In response, the USPTO has directed us to 
contact the inventors and ask each to sign new Declarations, one for each of the patent 
documents listed above. 

Enclosed you will find the following documents: 

1. The original patent application specification, as filed on October 9, 1997 
(U.S. Patent Application No. 08/947,649), on which all three patent 
documents are based; 

2. U.S. Patent No. 5,933,650, issued August 3, 1999 (from U.S. Patent 
Application No. 08/947,649, filed October 9, 1997), which includes an 
initial set of allowed claims; 

3. U.S. Patent No. 6,266,758, issued July 24, 2001 (fi-om U.S. Patent 
Application No. 09/263,798, filed March 5, 1999), which includes a 
second set of allowed claims; 

4. A list of currently pending allowed claims for U.S. Patent Application No. 
09/662,832; 

5. A copy of the original, executed Declaration and Power of Attorney for 
Patent Application (in five parts) as originally filed in U.S. Patent 
Apphcation No. 08/947,649 (now U.S. Patent No. 5,933,650); 

6. A new Declaration for U.S. Patent No. 5,933,650; 

7. A new Declaration for U.S. Patent No. 6,266,758; 

8. A new Declaration for U.S. Patent Application No. 09/662,832; and 

9. A copy of 37 C.F.R. § 10.18(b) and (c). 

We ask that you please review these documents, with particular attention directed toward 
the allowed claims for each patent document. 

A Declaration for a patent application is a document that: 1) confirms each inventor's 
residence, mailing address, and citizenship; 2) certifies that each inventor contributed to at least 
one claim of the claimed subject matter; 3) certifies that the specification and claims have been 
reviewed and are understood; and 4) certifies that each inventor acknowledges the duty to 
disclose information that is material to patentability. 
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William A. Huffinan 
September 22, 2004 
Page 3 

Please carefully review the three new Declaration documents and any information that we 
have entered onto them. Your "residence" address should be your city and state of residence, or, 
if you reside outside the United States, the city and country of residence. The "mailing" address 
is the (full) address at which you customarily receive mail. Either your home or business address 
is an acceptable mailing address. Please make any corrections, if necessary, in blue ink and then 
initial and date in the margin. Once the information on the Declarations is complete and 
correct, and after your review of the application and the allowed claims for each invention, 
please sign and date each Declaration in blue ink where indicated. 

Every person who signs a document that is submitted to the USPTO makes a certification 
under 37 C.F.R. § 10.18(b) and (c). Therefore, a copy of this rule is also enclosed for your 
review. 

For your convenience, we have provided a self-addressed, stamped return envelope for 
returning the signed Declarations to us. We ask that you attend to this matter as soon as possible. 

Because U.S. Patent Application No. 09/662,832 has not yet issued as a patent, it is our 
obligation to remind you that a duty of disclosure continues throughout the entire patent 
application process and ends only with the actual issuance of a patent. Therefore, if you have or 
become aware of any information that might be considered material to patentability, please 
forward it to us immediately. 

We, along with our client, greatly appreciate your assistance with this matter, and look 
forward to your return of the executed Declarations. Jn the meantime, if you have any comments 
or questions regarding this matter, please do not hesitate to contact us. 



Very truly yours. 




Sterne, Kessler. Goldstein & Fox p.l.l.c. 
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(202) 772-8629 

Internet address: 

D0NF@SKGF.COM 



Henry P. More ton 
140 Phillip Road 
Woodside, CA 94062-2625 



Via Federal Express 



Re: U.S. Patent Nos. 5,933,650 & 6,266,758; Issued: August 3, 1999 & July 24, 2001 
U.S. Patent Application No. 09/662,832; Filed: September 15, 2000 
For: Alignment and Ordering of Vector Elements for Single Instruction 
Multiple Data Processing 

Inventors: Van Hook et al. 

Our Refs: 1778.0100000, 1778.0100001, and 1778.0100002 



Dear Mr. Moreton: 

Our law firm is handling a family of patents on which you were named as an inventor. 
This patent family specifically includes the following: 

• U.S. Patent No. 5,933,650, issued August 3, 1999; 

• U.S. Patent No. 6,266,758, issued July 24, 2001 ; and 

• U.S. Patent Application No. 09/662,832, filed September 15, 2000. 

Each of these documents is entitled ^'Alignment and Ordering of Vector Elements for Single 
Instruction Multiple Data (SIMD) Processing." All are based on the same patent specification. 
The initially filed patent application was owned by Silicon Graphics, Inc. However, the patent 
family is now owned by MIPS Technologies, Inc., our client. 

When the initial patent application was filed, the law firm that handled the filing 
requested status under 37 C.F.R. § 1.47. However, the request did not adequately meet the 
requirements set forth in this Rule. Rule 1.47, if followed properly, can allow an application to 
be filed even though one or more of the inventors refiises to sign the Declaration or cannot be 
reached. Because the inadequate request for Rule 1.47 Status in the initial application was not 
discovered until recently, it has been carried through to the subsequently filed applications of this 
patent family. 
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We recently brought this error to the attention of the U.S. Patent and Trademark Office 
(USPTO) and asked for direction in correcting it. In response, the USPTO has directed us to 
contact the inventors and ask each to sign new Declarations, one for each of the patent 
documents listed above. 

Enclosed you will find the following documents: 

1. The original patent application specification, as filed on October 9, 1997 
(U.S. Patent Application No. 08/947,649), on which all three patent 
documents are based; 

2. U.S. Patent No. 5,933,650, issued August 3, 1999 (from U.S. Patent 
Application No. 08/947,649, filed October 9, 1997), which includes an 
initial set of allowed claims; 

3. U.S. Patent No. 6,266,758, issued July 24, 2001 (fi-om U.S. Patent 
Application No. 09/263,798, filed March 5, 1999), which includes a 
second set of allowed claims; 

4. A list of currently pending allowed claims for U.S. Patent Application No. 
09/662,832; 

5. A copy of the original, executed Declaration and Power of Attorney for 
Patent Application (in five parts) as originally filed in U.S. Patent 
AppHcationNo. 08/947,649 (now U.S. Patent No. 5,933,650); 

6. A new Declaration for U.S. Patent No. 5,933,650; 

7. A new Declaration for U.S. Patent No. 6,266,758; 

8. A new Declaration for U.S. Patent Application No. 09/662,832; and 

9. A copy of 37 C.F.R. § 10.18(b) and (c). 

We ask that you please review these documents, with particular attention directed toward 
the allowed claims for each patent document. 

A Declaration for a patent application is a document that: 1) confirms each inventor's 
residence, mailing address, and citizenship; 2) certifies that each inventor contributed to at least 
one claim of the claimed subject matter; 3) certifies that the specification and claims have been 
reviewed and are understood; and 4) certifies that each inventor acknowledges the duty to 
disclose information that is material to patentability. 
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Please carefully review the three new Declaration documents and any information that we 
have entered onto them. Your "residence" address should be your city and state of residence, or, 
if you reside outside the United States, the city and country of residence. The "mailing" address 
is the (full) address at which you customarily receive mail. Either your home or business address 
is an acceptable mailing address. Please make any corrections, if necessary, in blue ink and then 
initial and date in the margin. Once the information on the Declarations is complete and 
correct, and after your review of the application and the allowed claims for each invention, 
please sign and date each Declaration in blue ink where indicated. 

Every person who signs a document that is submitted to the USPTO makes a certification 
under 37 C.F.R. § 10.18(b) and (c). Therefore, a copy of this rule is also enclosed for your 
review. 

For your convenience, we have provided a self-addressed, stamped return envelope for 
returning the signed Declarations to us. We ask that you attend to this matter as soon as possible. 

Because U.S. Patent Application No. 09/662,832 has not yet issued as a patent, it is our 
obligation to remind you that a duty of disclosure continues throughout the entire patent 
appUcation process and ends only with the actual issuance of a patent. Therefore, if you have or 
become aware of any information that might be considered material to patentability, please 
forward it to us immediately. 

We, along with our client, greatly appreciate your assistance with this matter, and look 
forward to your return of the executed Declarations, hi the meantime, if you have any comments 
or questions regarding this matter, please do not hesitate to contact us. 



Very truly yours. 




Sterne, Kessler, Goldstein & Fox p.l.l.c. 
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Timothy J. Van Hook 
224 Oakgrove Avenue 
Atherton, CA 94027 



September 22, 2004 



Writer *s Direct Number: 

(202) 772-8629 

Internet ADDRESS: 

DONF@SKGFXOM 

Via Federal Express 



Re: U.S. Patent Nos. 5,933,650 & 6,266,758; Issued: August 3, 1999 & July 24, 2001 
U.S. Patent Application No. 09/662,832; Filed: September 15, 2000 
For: Alignment and Ordering of Vector Elements for Single Instruction 
Multiple Data Processing 

Inventors: Van Hook et al 

Our Refs: 1778.0100000, 1778.0100001, and 1778.0100002 



Dear Mr. Van Hook: 

Our law firm is handling a family of patents on which you were named as an inventor. 
This patent family specifically includes the following: 

• US. Patent No. 5,933,650, issued August 3, 1999; 

• U.S. Patent No. 6,266,758, issued July 24, 2001; and 

• U.S. Patent Application No. 09/662,832, filed September 15, 2000. 

Each of these documents is entitled "Alignment and Ordering of Vector Elements for Single 
Instruction Multiple Data (SIMD) Processing." All are based on the same patent specification. 
The initially filed patent application was owned by Silicon Graphics, Inc. However, the patent 
family is now owned by MPS Technologies, Inc., our cHent. 

When the initial patent application was filed, the law firm that handled the filing 
requested status under 37 C.F.R. § 1.47. However, the request did not adequately meet the 
requirements set forth in this Rule. Rule 1.47, if followed properly, can allow an apphcation to 
be filed even though one or more of the inventors refiises to sign the Declaration or cannot be 
reached. Because the inadequate request for Rule 1.47 Status in the initial application was not 
discovered until recently, it has been carried through to the subsequently filed applications of this 
patent family. 
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We recently brought this error to the attention of the U.S. Patent and Trademark Office 
(USPTO) and asked for direction in correcting it. In response, the USPTO has directed us to 
contact the inventors and ask each to sign new Declarations, one for each of the patent 
documents listed above. 

Enclosed you will find the following documents: 

1. The original patent application specification, as filed on October 9, 1997 
(U.S. Patent Application No. 08/947,649), on which all three patent 
documents are based; 

2. U.S. Patent No. 5,933,650, issued August 3, 1999 (from U.S. Patent 
Application No. 08/947,649, filed October 9, 1997), which includes an 
initial set of allowed claims; 

3. U.S. Patent No. 6,266,758, issued July 24, 2001 (from U.S. Patent 
Application No. 09/263,798, filed March 5, 1999), which includes a 
second set of allowed claims; 

4. A list of currently pending allowed claims for U.S. Patent Application No. 
09/662,832; 

5. A copy of the original, executed Declaration and Power of Attorney for 
Patent Application (in five parts) as originally filed in U.S. Patent 
AppHcation No. 08/947,649 (now U.S. Patent No. 5,933,650); 

6. A new Declaration for U.S. Patent No. 5,933,650; 

7. A new Declaration for U.S. Patent No. 6,266,758; 

8. A new Declaration for U.S. Patent Application No. 09/662,832; and 

9. A copy of 37 C.F.R. § 10.18(b) and (c). 

We ask that you please review these documents, with particular attention directed toward 
the allowed claims for each patent document. 

A Declaration for a patent application is a document that: 1) confirms each inventor's 
residence, mailing address, and citizenship; 2) certifies that each inventor contributed to at least 
one claim of the claimed subject matter; 3) certifies that the specification and claims have been 
reviewed and are understood; and 4) certifies that each inventor acknowledges the duty to 
disclose information that is material to patentability. 
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Please carefully review the three new Declaration documents and any information that we 
have entered onto them. Your "residence" address should be your city and state of residence, or, 
if you reside outside the United States, the city and country of residence. The "mailing" address 
is the (full) address at which you customarily receive mail. Either your home or business address 
is an acceptable mailing address. Please make any corrections, if necessary, in blue ink and then 
initial and date in the margin. Once the information on the Declarations is complete and 
correct, and after your review of the application and the allowed claims for each invention, 
please sign and date each Declaration in blue ink where indicated. 

Every person who signs a document that is submitted to the USPTO makes a certification 
under 37 C.F.R. § 10.18(b) and (c). Therefore, a copy of this rule is also enclosed for your 
review. 

For your convenience, we have provided a self-addressed, stamped return envelope for 
returning the signed Declarations to us. We ask that you attend to this matter as soon as possible. 

Because U.S. Patent Application No. 09/662,832 has not yet issued as a patent, it is our 
obligation to remind you that a duty of disclosure continues throughout the entire patent 
application process and ends only with the actual issuance of a patent. Therefore, if you have or 
become aware of any information that might be considered material to patentability, please 
forward it to us immediately. 

We, along with our client, greatly appreciate your assistance with this matter, and look 
forward to your return of the executed Declarations. In the meantime, if you have any comments 
or questions regarding this matter, please do not hesitate to contact us. 



Very truly yours, 



Sterne, Kessler, Goldstein & Fox p.l.l.c. 




Donald J. Featherstone 
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Re: U.S. Patent Nos. 5,933,650 & 6,266,758; Issued: August 3, 1999 & July 24, 2001 
U.S. Patent Application No. 09/662,832; Filed: September 15, 2000 
For: Alignment and Ordering of Vector Elements for Single Instruction 
Multiple Data Processing 

Inventors: Van Hook et al 

Our Refs: 1778.0100000, 1778.0100001, and 1778.0100002 



Dear Mr. Killian: 

Our law firm is handling a family of patents on which you were named as an inventor. 
This patent family specifically includes the following: 

• U.S. Patent No. 5,933,650, issued August 3, 1999; 

• U.S. Patent No. 6,266,758, issued July 24, 2001 ; and 

• U.S. Patent Application No. 09/662,832, filed September 15, 2000. 

Each of these documents is entitled "Alignment and Ordering of Vector Elements for Single 
Instruction Multiple Data (SMD) Processing." All are based on the same patent specification. 
The initially filed patent application was owned by Silicon Graphics, Inc. However, the patent 
family is now owned by MIPS Technologies, Inc., our client. 

When the initial patent application was filed, the law firm that handled the filing 
requested status under 37 C.F.R. § 1.47. However, the request did not adequately meet the 
requirements set forth in this Rule. Rule 1.47, if followed properly, can allow an application to 
be filed even though one or more of the inventors refiises to sign the Declaration or cannot be 
reached. Because the inadequate request for Rule 1.47 Status in the initial application was not 
discovered until recently, it has been carried through to the subsequently filed applications of this 
patent family. 
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We recently brought this error to the attention of the U.S. Patent and Trademark Office 
(USPTO) and asked for direction in correcting it. In response, the USPTO has directed us to 
contact the inventors and ask each to sign new Declarations, one for each of the patent 
documents listed above. 

Enclosed you will find the following documents: 

1. The original patent application specification, as filed on October 9, 1997 
(U.S. Patent Application No. 08/947,649), on which all three patent 
documents are based; 

2. U.S. Patent No. 5,933,650, issued August 3, 1999 (from U.S. Patent 
Application No. 08/947,649, filed October 9, 1997), which includes an 
initial set of allowed claims; 

3. U.S. Patent No. 6,266,758, issued July 24, 2001 (fi-om U.S. Patent 
AppUcation No. 09/263,798, filed March 5, 1999), which includes a 
second set of allowed claims; 

4. A list of currently pending allowed claims for U.S. Patent Application No. 
09/662,832; 

5. A copy of the original, executed Declaration and Power of Attorney for 
Patent Application (in five parts) as originally filed in U.S. Patent 
Application No. 08/947,649 (now U.S. Patent No. 5,933,650); 

6. A new Declaration for U.S. Patent No. 5,933,650; 

7. A new Declaration for U.S. Patent No. 6,266,758; 

8. A new Declaration for U.S. Patent Application No. 09/662,832; and 

9. A copy of 37 C.F.R. § 10.18(b) and (c). 

We ask that you please review these documents, with particular attention directed toward 
the allowed claims for each patent document. 

A Declaration for a patent application is a document that: 1) confirms each inventor's 
residence, mailing address, and citizenship; 2) certifies that each inventor contributed to at least 
one claim of the claimed subject matter; 3) certifies that the specification and claims have been 
reviewed and are understood; and 4) certifies that each inventor acknowledges the duty to 
disclose information that is material to patentability. 
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Please carefully review the three new Declaration documents and any information that we 
have entered onto them. Your "residence" address should be your city and state of residence, or, 
if you reside outside the United States, the city and country of residence. The "mailing" address 
is the (full) address at which you customarily receive mail. Either your home or business address 
is an acceptable mailing address. Please make any corrections, if necessary, in blue ink and then 
initial and date in the margin. Once the information on the Declarations is complete and 
correct, and after your review of the application and the allowed claims for each invention, 
please sign and date each Declaration in blue ink where indicated. 

Every person who signs a document that is submitted to the USPTO makes a certification 
under 37 C.F.R. § 10.18(b) and (c). Therefore, a copy of this rule is also enclosed for your 
review. 

For your convenience, we have provided a self-addressed, stamped return envelope for 
returning the signed Declarations to us. We ask that you attend to this matter as soon as possible. 

Because U.S. Patent Application No. 09/662,832 has not yet issued as a patent, it is our 
obligation to remind you that a duty of disclosure continues throughout the entire patent 
application process and ends only with the actual issuance of a patent. Therefore, if you have or 
become aware of any information that might be considered material to patentability, please 
forward it to us immediately. 

We, along with our client, greatly appreciate your assistance with this matter, and look 
forward to your return of the executed Declarations, hi the meantime, if you have any comments 
or questions regarding this matter, please do not hesitate to contact us. 



Steme, Kessler, Goldstein & Fox p.LLC. : 1 1 00 New York Avenue, NW : Washington, DC 20005 : 202.371.2600 f 202.371.2540 : www.skgf.com 



Very truly yours. 



Sterne, Kessler, Goldstein & Fox p.l.l.c. 




Donald J. Featherstone 
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(202)772-8629 

Internet Addr£SS: 

D0NF@SKGF.COM 

Peter Yan-Tek Hsu Via Federal Express 

1 Rausch Street, Unit F 
San Francisco, CA 94103 



Re: U.S. Patent Nos. 5,933,650 & 6,266,758; Issued; August 3, 1999 & July 24, 2001 
U.S. Patent Application No. 09/662,832; Filed: September 15, 2000 
For: Alignment and Ordering of Vector Elements for Single Instruction 
Multiple Data Processing 

Inventors: Van Hook et al 

OurRefs: 1778.0100000, 1778.0100001, and 1778.0100002 



Dear Mr. Hsu: 

Our law firm is handling a family of patents on which you were named as an inventor. 
This patent family specifically includes the following: 

• U.S. Patent No. 5,933,650, issued August 3, 1999; 

• U.S. Patent No. 6,266,758, issued July 24, 2001; and 

• U.S. Patent Application No. 09/662,832, filed September 15, 2000. 

Each of these documents is entitled '^Alignment and Ordering of Vector Elements for Single 
Instruction Multiple Data (SIMD) Processing." All are based on the same patent specification. 
The initially filed patent apphcation was owned by SiHcon Graphics, Inc. However, the patent 
family is now owned by MIPS Technologies, Inc., our client. 

When the initial patent application was filed, the law firm that handled the filing 
requested status under 37 C.F.R. § 1.47. However, the request did not adequately meet the 
requirements set forth in this Rule. Rule 1.47, if followed properly, can allow an apphcation to 
be filed even though one or more of the inventors reftises to sign the Declaration or cannot be 
reached. Because the inadequate request for Rule 1.47 Status in the initial application was not 
discovered undl recently, it has been carried through to the subsequently filed applications of this 
patent family. 
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We recently brought this error to the attention of the U.S. Patent and Trademark Office 
(USPTO) and asked for direction in correcting it. In response, the USPTO has directed us to 
contact the inventors and ask each to sign new Declarations, one for each of the patent 
documents listed above. 

Enclosed you will find the following documents: 

1. The original patent application specification, as filed on October 9, 1997 
(U.S. Patent Application No. 08/947,649), on which all three patent 
documents are based; 

2. U.S. Patent No. 5,933,650, issued August 3, 1999 (from U.S. Patent 
Application No. 08/947,649, filed October 9, 1997), which includes an 
initial set of allowed claims; 

3. U.S. Patent No. 6,266,758, issued July 24, 2001 (from U.S. Patent 
AppUcation No. 09/263,798, filed March 5, 1999), which includes a 
second set of allowed claims; 

4. A Hst of currently pending allowed claims for U.S. Patent Application No. 
09/662,832; 

5. A copy of the original, executed Declaration and Power of Attorney for 
Patent Application (in five parts) as originally filed in U.S. Patent 
Application No. 08/947,649 (now U.S. Patent No. 5,933,650); 

6. A new Declaration for U.S. Patent No. 5,933,650; 

7. A new Declaration for U.S. Patent No. 6,266,758; 

8. A new Declaration for U.S. Patent Application No. 09/662,832; and 

9. A copy of 37 C.F.R. §10.1 8(b) and (c). 

We ask that you please review these documents, with particular attention directed toward 
the allowed claims for each patent document. 

A Declaration for a patent application is a document that: 1) confirms each inventor's 
residence, mailing address, and citizenship; 2) certifies that each inventor contributed to at least 
one claim of the claimed subject matter; 3) certifies that the specification and claims have been 
reviewed and are understood; and 4) certifies that each inventor acknowledges the duty to 
disclose information that is material to patentability. 
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Please carefully review the three new Declaration documents and any information that we 
have entered onto them. Your "residence" address should be your city and state of residence, or, 
if you reside outside the United States, the city and country of residence. The "mailing" address 
is the (full) address at which you customarily receive mail. Either your home or business address 
is an acceptable mailing address. Please make any corrections, if necessary, in blue ink and then 
initial and date in the margin. Once the information on the Declarations is complete and 
correct, and after your review of the application and the allowed claims for each invention, 
please sign and date each Declaration in blue ink where indicated. 

Every person who signs a document that is submitted to the USPTO makes a certification 
under 37 C.F.R. § 10.18(b) and (c). Therefore, a copy of this rule is also enclosed for your 
review. 

For your convenience, we have provided a self-addressed, stamped return envelope for 
returning the signed Declarations to us. We ask that you attend to this matter as soon as possible. 

Because U.S. Patent Application No. 09/662,832 has not yet issued as a patent, it is our 
obligation to remind you that a duty of disclosure continues throughout the entire patent 
application process and ends only with the actual issuance of a patent. Therefore, if you have or 
become aware of any information that might be considered material to patentability, please 
forward it to us immediately. 

We, along with our client, greatly appreciate your assistance with this matter, and look 
forward to your return of the executed Declarations. In the meantime, if you have any comments 
or questions regarding this matter, please do not hesitate to contact us. 



Very truly yours. 
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ALIGNMENT AND ORDERING OF VECTOR ELEMENTS FOR SINGLE 
INSTRUCTION MULTIPLE DATA PROCESSING 



FIELD OF THE INVENTION 
5 The present invention relates to the field of single instruction multiple 

data vector (SIMD) processing. More particularly, the present claimed 
invention relates to alignment and ordering vector elements for SIMD 
processing. 

10 BACKGROUND ART 

Today, most processors in microcomputer systems provide a 64-bit 
wide datapath architecture. The 64-bit datapath allows operations such as 
read, write, add, subtract, and multiply on the entire 64 bits of data at once. 
However, for many applications the types of data involved simply do not 

15 require the full 64 bits. In media signal processing (MDMX) applicatior\s, for 
example, the light and sound values are usually represented in 8, 12, 16, or 24 
bit numbers. This is because people typically are not able to distinguish the 
levels of light and sound beyond the levels represented by these numbers of 
bits. Hence, data types in MDMX applications typically require less than the 

20 full 64 bits provided in the datapath in most computer systems. 

To efficiently utilize the entire datapath, the current generation of 
processors typically utilizes a single instruction multiple data (SIMD) method. 
According to this method, a multitude of smaller numbers are packed into 
25 the 64 bit doubleword as elements, each of which is then operated on 

independently and in parallel. Prior Art Figure 1 illustrates an exemplary 
single instruction multiple data (SIMD) method. Registers, vs and vt, in a 
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processor are of 64-bit width. Each register is packed with four 16-bit data 
elements fetched from memory: register vs contains vs[0], vs[l], vs[2], and 
vs[3l and register vt contains vt[0], vt[l], vt[2], and vt[3]. The registers in 
essence contain a vector of N elements. To add elements of matching index, 

5 an add instruction adds, independently, each of the element pairs of matching 
index from vs and vt A third register, vd, of 64-bit width may be used to 
store the result. For example, vs[01 is added to vt[0] and its result is stored into 
vdIO]. Similarly, vd[l], vd[2], and vd[3] store the sum of vs and vd elements of 
corresponding indexes. Hence, a single add operation on the 64-bit vector 

10 results in 4 simtdtaneous additions on each of the 16-bit elements. On the 

other hand, if 8-bit elements were packed into the registers, one add operation 
performs 8 independent additions in parallel. Consequently, when a SIMD 
arithmetic instruction such as addition, subtraction, or multiply, is performed 
on the data in the 64-bit datapath, the operation actually performs multiple 

15 numbers of operations independently and in parallel on each of the smaller 
elements comprising the 64 bit datapath. In SIMD vector operation, 
processors typically require alignment to the data type size of 64-bit 
doubleword on a load. This alignment ensures that the SIMD vector 
operations occur on aligned boundaries of a 64-bit doubleword boundary. 

io 

Unfortunately, the elements within application data vectors are 
frequently not 64-bit doubleword aligned for SIMD operations. For example, 
data elements stored in a memory unit are loaded into registers in a chunk 
such as a 64-bit doubleword format. To operate on the individual elements, 
25 the elements are loaded into a register. The order of the elements in the 

register remain the same as the order in the original memory. Accordingly, 
the elements may not be properly aligned for a SIMD operation. 
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Traditionally, when elements are not aligned with a proper boundary 
as reqtiired for a SIMD vector operation, the non-aligned vector processing 
have typically been reduced to scalar processing. That is, operations took 
5 place one element at a time instead of simultaneous multiple operations. 
Consequently, SIMD vector operations lost parallelism and performance 
advantages when the vector elements were not properly aligned. 

Furthermore, many media applications require a specific ordering for 
10 the elements within a SIMD vector. Since elements necessary for SIMD 

processing are commonly stored in multiple 64-bit doublewords with other 
elements, these elements need to be selected and assembled into a vector of 
desired order. For example, multiple channel data are commonly stored in 
separate arrays or interleaved in a single array. Processing the data requires 
15 interleaving or deinterleaving the multiple channels. Other applications 
require SIMD vector operations on transposed 2 dimensional arrays of data. 
Yet other applications reverse the order of elements in an array as in FFTs, 
DCTs, and convolution algorithms. 

20 Thus, what is needed is a method for aligning and ordering elements 

for more efficient SIMD vector operations by providing computational 
parallelism. 
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SUMMARY OF THE INVENTION 

The present invention provides alignment and ordering of vector 
elements for SIMD processing. The present invention is implemented in a 
computer system including a processor having a plurality of registers. In the 

5 aligrunent of vector elements for SIMD processing, one vector is loaded from 
a memory imit into a first register and another vector is loaded from the 
memory tmit into a second register. The first vector contains a first byte of an 
aligned vector to be generated. Then, a starting byte specifying the first byte of 
an aligned vector is determined. Next, a vector is extracted from the first 

10 register and the second register beginning from the first bit in the first byte of 
the first register continuing through the bits in the second register. Finally, 
the extracted vector is replicated into a third register such that the third 
register contains a plurality of elements aligned for SIMD processing. In the 
ordering of vector elements for SIMD processing, a first vector is loaded from 

15 a memory imit into a first register and a second vector is loaded from the 
memory xmit into a second register. Then, a subset of elements is selected 
from the first register and the second register. The elements from the subset 
are then replicated into the elements in the third register in a particular order 
suitable for subsequent SIMD vector processing. 

20 
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BRIEF DESCRIPTION OF THE DRAWINGS 

The accompanying drawings, which are incorporated in and form a 
part of this specification, illustrate embodiments of the invention and, 
together with the description, serve to explain the principles of the invention: 

5 

Prior Art Figure 1 illustrates an exemplary single instruction multiple 
data (SIMD) instruction method. 

Figure 2 illustrates a block diagram of an exemplary computer system 
for implementing the present invention. 
10 Figure 3 illustrates a block diagram of an exemplary datapath for 

aligning and ordering vector elements. 

Figure 4 illustrates a block diagram of an alignment unit in a processor 
for aligning a vector of elements. 

Figure 5 illustrates a flow diagram of the steps involved in extracting 
15 an aligned vector from two exemplary vectors. 

Figure 6A illustrates a block diagram of a full byte-mode crossbar circuit 
used in generating a vector of elements from elements of two vector registers. 

Figure 6B shows a more detailed diagram of the operation of an 
exemplary AND gate associated with element 7 in the first register, vs. 
20 Figure 7 illustrates shuffle operations for ordering 8-bit elements in a 

64-bit doubleword. 

Figure 8A illustrates a block diagram of a shuffle operation, which 
converts four imsigned upper bytes (i.e., 8 bits) in a source register to four 16- 
bit halves in a destination register. 
25 Figure 8B illustrates a block diagram of a shuffle operation, which 

converts a vector of unsigned low 4 bytes from a source register to four 16-bit 
halves in a destination register. 
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Figure 8C illustrates a block diagram of a shuffle operation, which 
converts a vector of signed upper 4 bytes from a source register to four 16-bit 
halves in a destination register by replicating the signs across the upper bytes 
in the halves. 

5 Figure 8D illustrates a block diagram of a shuffle operation, which 

converts a vector of signed low 4 bytes from a source register to four 16-bit 
halves in a destination register by replicating the signs across the upper bytes 
in the halves. 

Figure 8E illustrates a block diagram of a shuffle operation, which 
10 replicates the odd elements of 8 8-bit elements from each of two source 
registers into 8 elements in a destination vector register. 

Figure 8F illustrates a block diagram of a shuffle operation, which 
replicates the even elements of 8 8-bit elements from each of two source 
registers into 8 elements in a destination vector register, 
15 Figure 8G illustrates a block diagram of a shuffle operation, which 

replicates the upper 4 elements of 8 8-bit elements from each of two source 
registers into 8 elements in a destination vector register. 

Figure 8H illustrates a block diagram of a shuffle operation, which 
replicates the lower 4 elements of 8 8-bit elements from each of two source 
20 registers into 8 elements in a destination vector register. 

Figure 9 illustrates shuffle operations for ordering 16-bit elements in a 
64-bit doubleword. 

Figure lOA illustrates a block diagram of a shuffle operation, which 
replicates the upper 2 elements of 4 16-bit elements from each of two source 
25 registers into 4 elements in a destination vector register. 
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Figure lOB illustrates a block diagram of a shuffle operation, which 

replicates the lower 2 elements of 4 16-bit elements from each of two source 

registers into 4 elements in a destination vector register. 

Figure IOC illustrates a block diagram of a shuffle operation, which 
5 replicates 2 odd elements of 4 16-bit elements from each of two source 

registers into 4 elements in a destination vector register. 

Figure lOD illustrates a block diagram of a shuffle operation, which 

replicates 2 even elements of 4 16-bit elements from each of two source 

registers into 4 elements in a destination vector register. 
10 Figure lOE illustrates a block diagram of a shuffle operation, which 

replicates even elements 0 and 2 from one source register into odd elements 1 

and 3 in a destination vector register and further replicates odd elements 1 

and 3 from another source register into the even elements 0 and 2, 

respectively, of the destination vector register. 
15 Figure lOF illustrates a block diagram of a shuffle operation, which 

replicates even elements 0 and 2 from one source register into odd elements 3 

and 1, respectively, in a destination vector register and further replicates odd 

elements 1 and 3 from another source register into the even elements 2 and 0, 

respectively, of the destination vector register. 
20 Figure lOG illustrates a block diagram of a shuffle operation, which 

replicates the upper 2 elements of 4 16-bit elements from each of two source 

registers into a destination vector register. 

Figure lOH illustrates a block diagram of a shuffle operation, which 

replicates the lower 2 elements of 4 16-bit elements from each of two source 
25 registers into a destination vector register. 
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DESCRIPTION OF THE PREFERRED EMBODIMENTS 

In the following detailed description of the present invention, 
numerous specific details are set forth in order to provide a thorough 
understanding of the present invention. However, it will be obvious to one 
5 skilled in the art that the present invention may be practiced without these 
specific details. In other instances well known methods, procedures, 
components, and circuits have not been described in detail so as not to 
unnecessarily obscure aspects of the present invention. 

10 The present invention, a method for providing alignment and 

ordering of vector elements for single-instruction multiple-data (SIMD) 
processing, is described. The preferred embodiment of the present invention 
provides elements aligned and ordered for an efficient SIMD vector operation 
in a processor having 64-bit wide datapath within an exemplary computer 

15 system described below. Although such a datapath is exemplified herein, the 
present invention can be readily adapted to suit other datapaths of varying 
widths. 

COMPUTER SYSTEM ENVIRONMENT 
20 Figure 2 illustrates an exemplary computer system 200 comprised of a 

system bus 206 for commimicating information, a processor 202 coupled with 
the bus 206 for processing information and instructions, a computer readable 
volatile memory imit 210 (e.g., random access memory, static RAM, dynamic 
RAM, etc.) coupled with the bus 206 for storing information and instructions 
25 for the processor 202, a computer readable non-volatile memory unit 208 (e.g., 
read only memory, programmable ROM, flash memory, EPROM, EEPROM, 
etc.) coupled with the bus 206 for storing static information and instructions 



SGM5-4-457.00 



October 7, 1997 



for the processor 202. A vector register file 204 containing a plurality of 
registers is included in the processor 202. In the present invention, the term 
vector register file 204 encompasses any register file containing a plurality of 
registers and as such is not limited to vector register files. 

5 

The computer system 200 of Figure 2 further includes a mass storage 
computer readable data storage device 212 (hard drive, floppy, CD-ROM, 
optical drive, etc.) such as a magnetic or optical disk and disk drive coupled 
with the bus 206 for storing information and instructions. Optionally, the 

10 computer system 200 may include a display device 214 coupled to the bus 206 
for displaying information to the user, an alphanumeric input device 216 
including alphanumeric and ftmction keys coupled to the bus 206 for 
commimicating information and command selections to the processor 202, a 
cursor control device 218 coupled to the bus 206 for communicating user 

15 input information and command selections to the processor 202, and a signal 
generating device 220 coupled to the bus 206 for commimicating command 
selections to the processor 202. 

According to an exemplary embodiment of the present invention, the 
20 processor 202 includes a SIMD vector unit that functions as a coprocessor for 
or as an exter\sion of the processor 202. The SIMD vector unit performs 
various arithmetic and logical operations on each data element within a 
SIMD vector in parallel. The SIMD vector unit utilizes the register files of the 
processor 202 to hold SIMD vectors. The present invention may include one 
25 or more SIMD vector units to perform specialized operations such as 
arithmetic operations, logical operations, etc. 
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Figure 3 illustrates a block diagram of an exemplary datapath 300 for 
aligning and ordering vector elements. The datapath 300 includes a SIMD 
vector imit 302, an alignment xmit 322, a register file 304, a crossbar circuit 314, 
and a vector load/store unit 302. The vector load/store imit 302 performs 
load and store fimctions. It loads a vector from memory into one of the 
registers in the register file 304. It also stores a vector from one of the registers 
in the register file 304 into main memory. The alignment unit 312 receives 
two vectors from two source registers such as vs 306 and vt 308. Then, the 
alignment xmit 312 extracts an aligned vector from the two vectors and stores 
it into a destination register such as vd 310. The crossbar circuit 314 also 
receives two vectors two exemplary source registers, vs 306 and vt 308. The 
crossbar circuit 314 then selects a set of elements from the source registers and 
routes each of the elements in the selected set to a specified element in the 
exemplary destination register, vd 310. In an alternative embodiment, the 
crossbar circuit 314 may receive one vector from a single source register and 
select a set of elements from the vector. The data path 318 allows a result to 
be forwarded to the register file 304 or to the vector load /store unit to be 
stored into main memory. 

The SIMD vector tmit 302 represents a generic SIMD vector processing 
unit, which may be an arithmetic unit, logical unit, integer unit, etc. The 
SIMD vector unit 302 may receive either one or two vectors from one or two 
source registers. It should be appreciated that the present invention may 
include more than one SIMD vector unit performing various functions. The 
SIMD vector tinit 302 may execute an operation specified in the instruction 
on each element within a vector in parallel. 
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The exemplary vector register file 304 is preferably comprised of 32 64- 
bit general purpose registers. To this end, the preferred embodiment of the 
present invention utilizes the floating point registers (PGR) of a floating point 
imit (FPU) in the processor as its vector registers. In this shared arrangement, 

5 data is moved between the vector register file 304 and a memory imit through 
the vector load/store unit 302. These load and store operations are 
unformatted. That is, no format conversions are performed and therefore no 
floating-point exceptions can occur due to these operations. Similarly, data is 
moved between the vector register file 304 and the aligrunent unit 312, the 

10 crossbar circuit 314, or the SIMD vector unit 316 without format conversions, 
and thus no floating-point exception occurs. 

The present invention allows data types of 8-, 16-bit, 32-, or 64-bit fields. 
Hence, a 64-bit doubleword vector may contain 8 8-bit elements, 4 16-bit 

15 elements, 2 32-bit elements, or 1 64-bit element According to this 

convention, vector registers of the present invention are interpreted in the 
following data formats: Quad Half (QH), Oct Byte (OB), Bi word (BW), and 
Long (L). In QH format, a vector register is interpreted as having 16-bit 
elements. For example, a 64-bit vector register is interpreted as a vector of 4 

20 signed 16-bit integers. OB format interprets a vector register as being 

comprised of 8-bit elements. Hence, an exemplary 64-bit vector register is 
seen as a vector of 8 unsigned 8-bit integers. In BW format, a vector register is 
interpreted as having 2 32-bit elements. L format interprets a vector register 
as having a 64-bit element. These data types are provided to be adaptable to 

25 various register sizes of a processor. As described above, data format 

conversion is not necessary between these formats and floating-point format. 
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According to a preferred embodiment of the present invention, 
exemplary source registers, vs and vt, are each used to hold a set of vector 
elements. A third exemplary vector register, vd, is created from the source 
registers and holds a set of elements selected from the source registers. 
5 Although the registers, vs, vt, and vd, are used to associate vector registers 
with a set of vector elements, other vector registers are equally suitable for 
present invention. 

LOAD/STORE INSTRUCTIONS 
10 The load and store instructions of the present invention use a special 

load /store unit to load and store a 64-bit doubleword between a register in a 
register file such as an FPR and a memory unit. The doubleword is loaded 
through an exemplary load /store unit 302 illustrated above in Figure 3. The 
load/store imit performs loading or storing of a doubleword with upper 61 
15 bits of an effective address. The lowest 3 bits specify a byte address within the 
64-bit doubleword for alignment. 

According to a preferred embodiment, an effective address is formed by 
adding the contents of an index value in a general purpose register (GPR) to a 

20 base address in another GPR. The effective address is doubleword aligned. 
During the loading process, the last three bits of the effective address are 
ignored by treating these bits as Qs. Hence, the effective address is comprised 
of bits 3 to 63. The three bits from 0 to 2 contain the byte address for accessing 
individual bytes within a doubleword and are ignored by treating the three 

25 bits as Os. If the size of a register in a register file is 64-bits, then the 64-bit data 
stored in memory at the effective address is fetched and loaded into the 
register. If on the other hand, the size of the register in the register file is 32- 
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bits, then the lower 32 bits of the data are loaded into the vector register and 
the upper 32 bits of the data are loaded into the next register in sequence. 
Hence, a pair of 32-bit registers are used to hold a 64-bit data from the 
memory. 

5 

Conversely, the store instruction stores a doubleword from a vector 
register such as an FPR to the memory while ignoring alignment. The store 
operation is carried out through the exemplary load /store unit 302 illustrated 
above in Figure 3. The contents of a 64-bit doubleword in FPR, fs, is stored at 
10 the memory location specified by the effective address. The contents of GPR 
index and GPR base are added to form the effective address. The effective 
address is doubleword aligned. The last three bits of the effective address are 
ignored. 

15 The effective address is formed by adding the contents of an index 

value in a general purpose register (GPR) to a base address in another GPR 
while ignoring the lowest three bits of the effective address by interpreting 
them as Os. That is, the effective address is comprised of bits 3 to 63. The 
ignored three bits contain the byte address for accessing individual bytes 

20 within a doubleword. If the size of a vector register is 64-bits, then the content 
of flie vector register is stored into memory. If on the other hand, the size of a 
vector register is 32-bits, then the lower 32 bits of the data are concatenated 
with the upper 32 bits of the data contained in the next register in sequence. 
Then, the concatenated 64-bit doubleword is stored into memory at the 

25 address specified by the effective address. 
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ATJGNMENT INSTRUCTION 
The present alignment instruction operates on two 64-bit doublewords 
loaded into two registers from memory by issuing two load instructions. One 
doubleword is loaded into a first register (vs) and the other doubleword is 
5 loaded into a second register (vt). The alignment instruction generates a 64- 
bit doubleword vector in a third register (vd) aligned for a SIMD vector 
operation. Preferably, an alignment unit performs alignment of a vector by 
fuxmel shift to extract an aligned 64-bit vector of elements from the two 64-bit 
registers. 

10 

Figure 4 illustrates a block diagram of an alignment unit in a processor 
for aligning a vector of elements. The vector load/store imit 404 loads- two 
vectors from main memory 402 into two vector registers, vs and vt, in a 
register file 408. The alignment unit 410 receives the two vectors in the 
15 vector registers, vs and vt, and extracts a byte aligned vector. Three control 
lines 412 representing three bits for the byte address controls the byte 
alignment performed through the alignment unit 410. The aligned vector is 
then forwarded to an exemplary vector register, vd, in the register file. 

20 The alignment of a vector is dependent on a byte ordering mode of a 

processor. Byte ordering within a larger data size such as a 64-bit doubleword 
may be configured in either big-endian or little-endian order, Endian order 
refers to the location of byte 0 within a multi-byte data. A processor according 
' to the present invention may be configured as either a big-endian or little- 

25 endian system. For example, in a little-endian system, byte 0 is the least 

significant (i.e., rightmost) byte. On the other hand, in a big-endian system, 
byte 0 is the most significant (i.e., leftmost) byte. In the present invention, an 
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exemplary processor uses byte addressing for a doubleword access, which is 
aligned on a byte boundary divisible by eight (i.e., 0, 8, 16, 56). Hence, a 64- 
bit doubleword loaded into a register in a processor is byte-aligned in either a 
big-endian or a little-endian mode. For a little-endian mode processor, the 
5 starting (i.e., first) byte for a vector to be extracted lies in the second vector 
register. Conversely for a big-endian mode processor, the starting (i.e., first) 
byte for the vector resides in the first vector register. 

Figure 5 illustrates a flow diagram of the steps involved in extracting 
10 an aligned vector from two exemplary vectors. In step 502, two 64-bit 

doublewords are loaded from a memory unit into two 64-bit registers. One 
64-bit doubleword is loaded into a first register and the other 64-bit 
doubleword in memory is loaded into the second register. Preferably, the 
former doubleword and the next doubleword are stored in contiguous 
15 memory space and their starting addresses differ by 64-bits or 8 bytes. The 
loading of the doublewords are accomplished through a load/store unit 
according to the load instruction described above. 

The starting byte address of the aligned vector to be extracted is then 
20 determined in step 704. According to the preferred embodiment, the riegister 
and vector are all 64-bit wide. Since a 64-bit doubleword contains 8 bytes, 
three bits are needed to specify all the byte positions in a 64-bit doubleword. 
Hence, the preferred embodiment uses 3 bits to specify the position of the 
starting byte address in a 64-bit vector. 

25 

In one embodiment of the present invention, an alignment instruction 
provides an immediate, which is a constant byte address within a 
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doubleword. Preferably, the immediate consists of 3 bits for specifying a 
cor\stant byte address to a byte among 8 bytes each in the first register (i.e., 
little-endian mode processor) and the second register (Le., bit-endian mode 
processor). This alignment instruction performs a constant alignment of a 
5 vector. The align amount is computed by masking the immediate, then 
using that value to control a furmel shift of vector vs concatenated v^ith 
vector vt The operands can be in the QH, OB, or BW format. 

In an alternative embodiment, the aligrvment instruction provides a 
10 variable byte addressing by specifying an address of a general purpose register 
(GPR) containing the starting byte address in the first register. This 
instruction accesses the GPR by using the address provided in the alignment 
- instruction. Then, the instruction extracts the lower 3 bits in the GPR to 
obtain the starting byte address in the first register (i.e., little-endian mode) or 
15 the second register (i.e., big-endian mode). The align amount is computed by 
masking the contents of GPR, rs, then using that value to control a funnel 
shift of vector vs concatenated with vector vt. The operands can be in QH, 
OB, or BW format. 

20 After determining the starting byte address in step 504 of the flowchart 

in Figure 5, the first bit of the starting byte address is determined in step 506 by 
multiplying the starting byte address by 8. For example, if the starting byte 
address were 3, the first bit of the starting byte address is 3*8 or 24. Then in 
step 508, a 64-bit doubleword is extracted by concatenating from the first bit at 

25 the starting byte address in one register continuing through the other register. 
This concatenation is accomplished by funnel shifting from the first bit of the 
starting byte. Specifically, the first register is assigned bit positions from 0 to 
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63. The second register is assigned the next 64 bit positions from 64 to 127. 
The extraction scheme depends on the byte ordering modes. A variable s, 
representing the first bit position at the starting byte address, can be used to 
simplify the illustration of the differences between the byte ordering modes. 
In a big-endian byte mode, the concatenation occurs from bit position 127-s to 
64-s. Conversely, in a little-endian bye mode, the concatenation occurs from 
bit position s through 63+s. 

Then in step 510, the extracted vector is replicated into a destination 
register in the register file for SIMD vector processing. In an alternative, 
embodiment, the extracted vector may be stored into the memory unit for 
later use. The process then terminates in step 512. 

SHUFFLE INSTRUCTION 
The shuffle instruction according to the present invention provides a 
vector of ordered elements selected from either one or two other vector 
registers. One or more load/store instructions are used to load the vector(s) 
into registers for shuffle operation. One embodiment uses a full byte-mode 
crossbar to generate a vector of elements selected from the elements of two 
other exemplary vectors. That is, selected elements of the exemplary vectors, 
vs and vt, are merged into a new exemplary vector, vd. The new vector, vd, 
contains elements aligned for SIMD operation. Alternatively, a plurality of 
shuffle operations may be carried out to arrange the elements in a desired 
order for SIMD vector processing. 

Figure 6A illustrates a block diagram of a full byte-mode crossbar circuit 
600 used in generating a vector of elements from elements of two registers. 
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First, two vectors from a memory unit are loaded into two exemplary 
registers in a processor; the elements of the first vector are loaded into the 
first register, vs 602, and the elements of the second vector are loaded into the 
second register, vt 604. The elements of these two vector registers, vs 602 and 

5 vt 604, serve as source elements. The crossbar circuit 600 receives as input 
each of the elements from the two vector registers in parallel. A set of control 
lines 608 is coupled to the crossbar circuit 600 to relay a specific shuffle 
instruction operation. The shuffle instruction operation encodes a 
destination element for each of the selected source elements. In response to 

10 the specific shuffle instruction operation signals, the crossbar circuit 600 

selects a set of elements from the two registers, vs 602 and vt 604, and routes 
or replicates each element to its associated destination element in an 
exemplary destination register, vd 606. 

15 In addition, the present invention allows zeroing and sign extension 

of elements. For example with reference to Figure 6A, the present invention 
provides either zeroing or sign extension for each element in the first register, 
vs 602. In addition to providing the entire bits to the crossbar circuit 600, 
elements 0 through 7 in the first register, vs 602, provides their corresponding 

20 sign bits 612, 614, 616, 618, 620, 622, 624, and 626 (612 through 626) to the 
associated AND gates 628, 630, 632, 634, 636, 638, 640, and 642 (628 through 
642). Each of the AND gates 628 through 642 also receives as the other input, 
a control signal 610, which originate from a specific shuffle instruction for 
specifying either zeroing or sign extension mode. 

25 

Figure 6B shows a more detailed diagram of the operation of the 
exemplary AND gate 628 associated with element 7 in the first register, vs 602. 
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The AND gate 628 receives a single sign bit 612 from the most significant bit 
in the element 7 of the first register, vs 602. The AND gate 628 also receives 
the control signal 610. To provide zeroing for element 7 for example, the 
control signal 610 inputs a 0 into the AND gate 628. In this case, the output 
5 652 at the AND gate 628 is 0 no matter what the input is at the sign bit 612. On 
the other hand, when the control signal is 1, the AND gate 628 generates the 
sign bit 612 as the output 652, whatever the sign is. In both cases of zeroing 
and sign extension, the output 652 is routed to a plurality of output lines 654 
for replicating the output signal into an appropriate width. Preferably, the 

10 output lines 654 matches the number of bits in each element in the first 
register, vs 602. The crossbar circuit 600 accepts the signals on these output 
lines 652 and uses these signals to zero or sign extend element 7 when 
necessary according to a shuffle instruction. The AND gates for the other 
elements 0 to 6 operate in a similar manner to provide zeroing and sign 

15 extension bit signals to the crossbar circuit 600. 

The preferred embodiment of the present invention operates on 
vectors of elements in a preferred OB or QH mode. In an OB mode, a 64-bit 
doubleword vector is interpreted as having 8 8-bit elements. In a QH mode, 

20 the 64-bit vector is treated as containing 4 16-bit elements. For example, in OB 
mode, the crossbar circuit 600 selects, in parallel, as source elements eight 8-bit 
elements among the elements in the registers vs 602 and vt 604. Each of the 
eight elements is then replicated or routed into a particular destination 
element in the destination vector register, vd 606. In QH mode, the crossbar 

25 circuit selects four 16-bit elements and replicates or routes each element into a 
particular destination element in the destination register. Those skilled in 
the art will appreciate that the crossbar circuit represents one embodiment of 
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the present invention in implementing the shuffle instruction operations. A 
crossbar circuit is well known the art and is commonly used in conjunction 
with vector processing units. 



5 Figiure 7 illustrates shuffle operations for ordering 8-bit elements in a 

64-bit doubleword. Each row represents the destination vector register, vd, 
comprised of 8 elements, vd[0] to vd[7]. The first row 702 is comprised of 
placeholders to indicate the 8 elements. Below the first row 702 are 8 different 
shuffle operations in OB mode as indicated by the content of destination 

10 vector register, vd, for each row 704 to 718. These shuffle operations in OB 
mode are illustrated in Figures 8A through 8H. 

Figure 8A illustrates a block diagram of a shuffle operation, which 
converts four imsigned upper bytes (i.e., 8 bits) in a source register to four 16- 

15 bit halves in a destination register. This shuffle operation, represented by 
mnemonic UPUH.OB, selects the upper 4 8-bit elements in an exemplary 
vector register, vs. The selected elements vs[4], vs[5], vs[6], and vs[7] are 
replicated into destination elements vd[0], vd[2], vd[4], and vd[6], respectively. 
The odd elements of the destination vector register vd[l], vd[3], vd[5], and 

20 vd[7] are zeroed. 

Figure 8B illustrates a block diagram of a shuffle operation, which 
converts a vector of uxtsigned low 4 bytes in a register to 16-bit halves. This 
shuffle operation, represented by mnemonic UPUL.OB, selects the lower 4 8- 
25 bit elements in an exemplary vector register, vs. The selected elements vs[0], 
vsll], vs[2], and vs[3] are replicated into destination elements vd[0], vd[2]. 
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vd[4], and vd[6], respectively. The odd elements of the destination vector 
register vd[ll, vd[3], vd[5], and vd[7] are zeroed. 

Figure 8C illustrates a block diagram of a shuffle operation, which 
5 converts a vector of signed upper 4 bytes in a register to 16-bit halves. This 
shuffle operation, represented by mnemonic UPSH.OB, selects the upper 4 8- 
bit elements in an exemplary vector register, vs. The selected elements vs[4], 
vs[5], vsI61, and vs[7] are replicated into destination elements vd[0], vd[2], 
vd[4], and vd[6], respectively. The odd elements of the destination vector 
10 register vd[l], vd[3], vd[5], and vd[7] replicates the sign bits of the selected 
elements vs[4], vs[5], vs[6], and vs[7], respectively. 

Figure 8D illustrates a block diagram of a shuffle operation, which 
converts a vector of signed low 4 bytes in a register to 16-bit halves. This 

15 shuffle operation, represented by mnemonic UPSL.OB, selects the lower 4 8- 
bit elements in an exemplary vector register, vs. The selected elements vs[0], 
vs[l], vs[2], and vs[3] are replicated into destination elements vd[0], vd[2], 
vd[4], and vd[6], respectively. The odd elements of the destination vector 
register vd[l], vd[3], vd[5], and vd[7] replicates the sign bits of the selected 

20 elements vs[0], vs[l], vs[2], and vs[3], respectively. 

Figure 8E illustrates a block diagram of a shuffle operation, which 
replicates the odd elements of 8 8-bit elements from each of two source 
registers into 8 elements in a destination vector register. This shuffle 
25 operation, represented by an exemplary mnemonic PACH.OB, selects the odd 
elements of 8 8-bit elements in exemplary source vector registers, vs and vt. 
The elements selected from vs, namely vs[l], vs[3], vs[5], and vs[7] are 
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replicated into destination elements vd[4], vd[5], vd[6], and vd[7], respectively. 
The elements vt[l], vt[3], vt[5], and vt[7] from the vector register vt are 
replicated into destination elements vd[0], vd[l], vd[2], and vd[3], respectively. 



5 Figure 8F illustrates a block diagram of a shuffle operation, which 

replicates the even elements of 8 8-bit elements from each of two source 
registers into 8 elements in a destination vector register. This shuffle 
operation, represented by an exemplary mnemonic PAGL.OB, selects the even 
elements of 8 8-bit elements in exemplary source vector registers, vs and vt. 

10 The elements selected from vs, namely vs[0], vs[2], vs[4], and vs[8] are 

replicated into destination elements vd[4], vd[5], vd[6], and vd[7], respectively. 
The elements vt[0], vt[2], vt[4], and vt[6] from the vector register vt are 
replicated into destination elements vd[0], vd[l], vd[2], and vd[3], respectively. 

15 Figure 8G illustrates a block diagram of a shuffle operation, which 

replicates the upper 4 elements of 8 8-bit elements from each of two source 
Registers into 8 elements in a destination vector register. This shuffle 
operation, represented by an exemplary mnemonic MIXH.OB, selects the 
upper 4 elements of 8 8-bit elements in exemplary source vector registers, vs 

20 and vL The elements selected from vs, namely vs[4], vs[5], vs[6], and vs[7] are 
replicated into the odd elements of the destination vector register, namely 
vd[l], vd[3], vd[5], and vd[7], respectively. The elements vt[4], vt[5], vt[6], and 
vt[7] from the vector register vt are replicated into the even elements of the 
destination elements vd[0], vd[2], vd[4], and vd[6], respectively. 

25 

Figure 8H illustrates a block diagram of a shuffle operation, which 
replicates the lower 4 elements of 8 8-bit elements from each of two source 
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registers into 8 elements in a destination vector register. This shuffle 
operation, represented by an exemplary mnemonic MIXL.OB, selects the 
lower 4 elements of 8 8-bit elements in exemplary source vector registers, vs 
and vt The elements selected from vs, namely vs[0], vs[l], vs[2], and vs[3] are 
5 replicated into the odd elements of the destination vector register, namely 
vd[l], vd[3], vd[5], and vd[7], respectively. The elements vt[0], vt[l], vt[2], and 
vt[3] from the vector register vt are replicated into the even elements of the 
destination elements vd[0], vd[2], vd[4], and vd[6], respectively. 

10 A shuffle instruction operating in QH mode generates a new vector of 

elements for two types of operations. The first type of operation creates a 
vector of new data sizes by converting data sizes between 16-bit elements and 
32-bit elements in a vector. The second type creates a new vector of elements 
dravm from two other vectors. The present exemplary data type conversion 

15 operations enable a larger range of computational data format than their 

storage format, such as 32 bit computation on 16 bit numbers. In addition, the 
present embodiment operations allow conversion of a data set from a smaller 
range format to a larger range format or vice versa as between 16 and 32 bit 
data. 

20 

Figure 9 illustrates shuffle operations for ordering 16-bit elements in a 
64-bit doubleword. Each row represents the destination vector register, vd, 
comprised of 4 elements, vd[0] to vd[7]. The first row 902 is comprised of 
placeholders to indicate the 4 elements. Below the first row 902 are 4 different 
25 shuffle operations in QH mode as indicated by the content of destination 
vector register, vd, for each row 904 to 918. These shuffle operations in QH 
mode are illustrated in Figures lOA through lOH. 
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Figure lOA illustrates a block diagram of a shuffle operation, which 
replicates the upper 2 elements of 4 16-bit elements from each of two source 
registers into 4 elements in a destination vector register. This shuffle 
5 operation, represented by an exemplary mnemonic MIXH.QH, selects the 
upper 2 elements of 4 16-bit elements in exemplary source vector registers, vs 
and vt. The elements selected from vs, namely vs[2] and vs[3] are replicated 
into the odd elements of the destination vector register, namely vd[l] and 
vd[3], respectively. The elements vt[2] and vt[3] from the vector register vt are 
10 replicated into the even elements of the destination elements vd[0] and vd[2], 
respectively. 

Figure lOB illustrates a block diagram of a shuffle operation, which 
replicates the lower 2 elements of 4 16-bit elements from each of two source 

15 registers into 4 elements in a destination vector register. This shuffle 
operation, represented by an exemplary mnemonic MIXL.QH, selects the 
lower 2 elements of 4 16-bit elements in exemplary source vector registers, vs 
and vt. The elements selected from vs, namely vs[0] and vs[l] are replicated 
into the odd elements of the destination vector register, namely vd[l] and 

20 vd[3], respectively. The elements vt[0] and vt[l] from the vector register vt are 
replicated into the even elements of the destination elements vd[0] and vd[2], 
respectively. 



Figure IOC illustrates a block diagram of a shuffle operation, which 
25 replicates 2 odd elements of 4 16-bit elements from each of two source 
registers into 4 elements in a destination vector register. This shuffle 
operation, represented by an exemplary mnemonic PACH.QH, selects the 2 
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odd elements of 4 16-bit elements in exemplary source vector registers, vs and 
vt. The elements selected from vs, namely vs[l] and vs[3] are replicated into 
the upper 2 elements of the destination vector register, namely vd[2] and 
vd[3], respectively. The elements vt[l] and vt[3] from the vector register vt are 
5 replicated into the lower 2 elements of the destination elements vd[0] and 
vd[l], respectively. 

Figure lOD illustrates a block diagram of a shuffle operation, which 
replicates 2 even elements of 4 16-bit elements from each of two source 

10 registers into 4 elements in a destination vector register. This shuffle 

operation, represented by an exemplary mnemonic PACL.QH, selects the 2 
even elements of 4 16-bit elements in exemplary source vector registers, vs 
and vt- The elements selected from vs, namely vs[0] and vs[2] are replicated 
into the upper 2 elements of the destination vector register, namely vd[2] and 

15 vd[3], respectively. The elements vt[0] and vt[2] from the vector register vt are 
replicated into the lower 2 elements of the destination elements vd[0] and 
vdfl], respectively. 

Figure lOE illustrates a block diagram of a shuffle operation, which 
20 replicates even elements from one source register and odd elements from 
another source register into a destination vector register. This shuffle 
operation, represented by an exemplary mnemonic BFLA.QH, selects the 2 
even elements of 4 16-bit elements from an exemplary source vector register, 
vs. The shuffle operation also selects the 2 odd elements of 4 16-bit elements 
25 from another exemplary source vector register, vt. The even elements 

selected from vs, namely vs[0] and vs[2] are replicated into the 2 odd elements 
of the destination vector register, namely vd[l] and vd[3], respectively. The 
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odd elements vt[l] and vt[3] from the vector register vt are replicated into the 
2 even elements of the destination elements vd[0] and vd[l], respectively. 

Figure lOF illustrates a block diagram of a shuffle operation, which 
5 replicates even elements from one source register and odd elements from 
another source register into a destination vector register. This shuffle 
operation, represented by an exemplary mnemonic BFLB.QH, selects the 2 
even elements of 4 16-bit elements from an exemplary source vector register, 
vs. The shuffle operation also selects the 2 odd elements of 4 16-bit elements 
10 from another exemplary source vector register, vt. The even elements 

selected from vs, namely vs[0] and vs[2] are replicated into the 2 odd elements 
of the destination vector register in reverse order, namely vd[3] and vd[l], 
respectively. The odd elements vt[l] and vt[3] from the vector register vt are 
replicated into the 2 even elements of the destination elements in reverse 
15 order, namely vd[0] and vd[l], respectively. 

Figure lOG illustrates a block diagram of a shuffle operation, which 
replicates the upper 2 elements of 4 16-bit elements from each of two source 
registers into a destination vector register. This shuffle operation, 

20 represented by an exemplary mnemonic REPA.QH, selects the upper 2 

elements of 4 16-bit elements in exemplary source vector registers, vs and vt. 
The upper elements selected from vs, namely vs[2] and vs[3] are replicated 
into the upper elements of the destination vector register, namely vd[2] and 
vdI3], respectively. The upper elements vt[2] and vt[3] from the vector register 

25 vt are replicated into the lower elements of the destination elements vd[0] 
andvdl2], respectively. 
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Figure lOH illustrates a block diagram of a shuffle operation, which 
replicates the lower 2 elemerxts of 4 16-bit elements from each of two source 
registers into a destination vector register. This shuffle operation, 
represented by an exemplary mnemonic REPB.QH, selects the lower 2 
5 elements of 4 16-bit elements in exemplary source vector registers, vs and vt. 
The lower elements selected from vs, namely vs[0] and vs[l] are replicated 
into the upper elements of the destination vector register, namely vd[2] and 
vd[3], respectively. The lower elements vt[0] and vt[l] from the vector register 
vt are replicated into the lower elements of the destination elements vd[0] 
10 andvd[2], respectively. 



The shuffle instructions allow more efficient SIMD vector operations. 
First, the shuffle operation creates a vector of new data sizes by converting 
between 8-bit elements and 16-bit elements in a vector. These data type 
15 conversions enable a larger range of computational data format than their 
storage format, such as 16 bit computation on 8 bit numbers. For example, 
these operations allow conversion of a data set from a smaller range format 
to a larger range format or vice versa as between 8 and 16 bit audio or video 
data. 

20 

Second, the shuffle operations are also useful in interleaving and 
deinterleaving data. For example, some applications store multiple channel 
data in separate arrays, or interleaved in a single array. These applications 
typically require interleaving or deinterleaving the multiple channels. In 
25 these applications, separate R, G, B, A byte arrays may be converted into an 
interleaved RGBA array by the following series of shuffle instructions: 

MIXL.OB RGL, R, G ; RGRGRGRG 
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MIXL.OB BAL, B, A ; BABABABA 

MIXH.OB RGH, R, G ; RGRGRGRG 

MIXH.OB BAH, B, A ; BABABABA 

MIXL.QS RGBALL, RGL, BAL ; RGBARGBA 
MIXH.QS RGBALH, RGL, BAL ; RGBARGBA 
MIXL.QS RGBAHL, RGH, BAH ; RGBARGBA 
MIXH.QS RGBAHH, RGH, BAH; RGBARGBA 



Conversely, an interleaved RGBA array may be deinterleaved into separate R, 
G, B, and A arrays by the following series of shuffle instructiorxs: 



PACX.OB GAOGAl, RGBAO, RGBAl 

PACH.OB RBORBl, RGBAO, RGBAl 

PACL.OB GA2GA3, RGBA2, RGBA3 

PACH.OB RB2RB3, RGBA2, RGBA3 

15 PACL.OB A0A1A2A3, GAOGAl, GA2GA3 

PACH.OB G0G1G2G3, GAOGAl, GA2GA3 

PACL.OB B0B1B2B3, RBORBl, RB2RB3 

PACH.OB R0R1R2R3, RBORBl, RB2RB3 



20 Third, some algorithms operate on 2 dimensional arrays of data such as 

images. Such an array typically orders the elements of the array in a major 
axis, where the elements are consecutive, and a minor axis, where the 
elements are separated by the size of the major axis. Often, a transpose 
operation is performed on the 2 dimensional array by converting the major 

25 axis to minor axis and vice versa. A common example is a discrete cosine 
traiisformation (DCT) requiring transposing 8x8 block of array. In this 
example, the 8x8 block of array consists of following elements: 



30 





dO 


dl 


d2 


d3 


d4 


d5 


d6 


d7 


sO 


AO 


BO 


CO 


DO 


EO 


FO 


GO 


HO 


si 


Al 


Bl 


CI 


Dl 


El 


Fl 


Gl 


HI 


s2 


A2 


B2 


a 


D2 


E2 


F2 


G2 


H2 


s3 


A3 


B3 


C3 


D3 


E3 


F3 


G3 


H3 


s4 


A4 


B4 


C4 


D4 


E4 


F4 


G4 


H4 


s5 


A5 


B5 


C5 


D5 


E5 


F5 


G5 


H5 


s6 


A6 


B6 


C6 


D6 


E6 


F6 


G6 


H6 


s7 


A7 


B7 


a 


D7 


E7 


F7 


G7 


H7 
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The present invention can transpose the 8x8 transpose in OB mode in 



24 instructions, of which 12 are shown as follows: 

MIXH.OB to, sO, si AO Al BO Bl CO CI DO Dl 

5 MIXH.OB tl, s2, s3 A2 A3 B2 B3 C2 C3 D2 D3 

MIXH-OB t2, s4, s5 A4 A5 B4 B5 C4 C5 D4 D5 

MIXH.OB t3, s6, s7 A6 A7 B6 B7 C6 C7 D6 D7 

MIXH.QH uO, to, tl AO Al A2 A3 BO Bl B2 B3 

MIXH.QH ul, t2, t3 A4 A5 A6 A7 B4 B5 B6 B7 

10 MIXH.QH u2, to, tl CO CI C2 C3 DO Dl D2 D3 

MIXH.QH u3, t2, t3 C4 C5 C6 C7 D4 D5 D6 D7 

REPA.QH dO, uO, ul AO Al A2 A3 A4 A5 A6 A7 

REPB.QH dl, uO, ul BO Bl B2 B3 B4 B5 B6 B7 

REPA.QH d2, u2, u3 CO CI C2 C3 C4 C5 C6 C7 

15 REPB.QH d3, u2, u3 DO Dl D2 D3 D4 D5 D6 D7 

MDCL.OB to, sO, si EO El FO Fl GO Gl HO HI 

MD(L.OB tl, s2, s3 E2 E3 F2 F3 G2 G3 H2 H3 

MIXL.OB t2, s4, s5 E4 E5 F4 F5 G4 G5 H4 H5 

20 MIXL.OB t3, s6, s7 E6 E7 F6 F7 G6 G7 H6 H7 

MIXL.QHuO,tO,tl EO El E2 E3 FO Fl F2 F3 

MIXL.QH ul, t2, t3 E4 E5 E6 E7 F4 F5 F6 F7 

MDCL.QH u2, tO, tl GO Gl G2 G3 HO HI H2 H3 

MDCL.QH u3, t2, t3 G4 G5 G6 G7 H4 H5 H6 H7 

25 REPA.QH dO, uO, ul EO El E2 E3 E4 E5 E6 E7 

REPB.QH dl, uO, ul FO Fl F2 F3 F4 F5 F6 F7 

REPA.QH d2, u2, u3 GO Gl G2 G3 G4 G5 G6 G7 

REPB.QH d3, u2, u3 HO HI H2 H3 H4 H5 H6 H7 



In another example, an exemplary 4x4 array block consists of following 
elements: 





dO 


dl 


d2 


d3 


sO 


A 


B 


C 


D 


si 


E 


F 


G 


H 


s2 


I 


J 


K 


L 


s3 


M 


N 


O 


P 



A transpose operation of the 4x4 array block in QH mode uses 8 shuffle 
instructions as follows: 

40 

MIXH.QH to, so, si A E B F 

MIXH.QH tl, s2, s3 I M J N 
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REPA.QH do, to, tl A E I M 

REPB.QH dl, to, tl BFJN 

MIXL.QHtO,sO,sl CGDH 

5 MIXL.QH tl, s2, s3 K O L P 

REPA.QH d2, tO, tl C G K O 

REPB.QH d3, tO, tl D H L P 

The shuffle instructions such as BFLA and BFLB allow reversing the 

order of elements in an array, in pairs or groups of 4. Larger groups can be 

10 reordered by memory or register address because they are a multiple of 64 bit 
elements. Inverting the order of a large array can be accomplished by 
inverting each vector of 4 elements v^ith BFLB and loading from or storing 
each doubleword to the mirrored address in the array. Similarly, a butterfly 
on a large array can be assembled from double word addressing and BFLA or 

15 BFLB operations on the addressed doublewords. 

The present invention thus provides a method for providing element 
aligiunent and ordering for SIMD processing. While the present invention 
has been described in particular embodiments, it should be appreciated that 
20 the present invention should not be construed as being limited by such 
embodiments, but rather construed according to the claims below. 
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CLAIMS 

What is claimed is: 

1. In a computer system including a processor having a plurality of 
registers, a method for generating an aligned vector of first width from two 

5 second width vectors for single instruction multiple data (SIMD) processing, 
comprising the steps of: 

loading a first vector from a memory unit into a first register, wherein 
the first vector contains a first byte of an aligned vector to be generated; 

loading a second vector from the memory unit into a second register; 
10 determining a starting byte in the first register wherein the starting byte 

specifies the first byte of an aligned vector; 

extracting a first width vector from the first register and the second 
register beginning from the first bit in the first byte of the first register 
continuing through the bits in the second register; and 
15 replicating the extracted first width vector into a third register such that 

die third register contains a plurality of elements aligned for SIMD 
processing. 

2. The method as recited in Claim 1 further comprising the step of 
20 storing the aligned vector in the third register to the memory tmit. 

3. The method as recited in Claim 1, wherein the first width and 
second width are each 64 bits. 

25 4. The method as recited in Claim 3, wherein the third register is 

comprised of 8 8-bit elements. 
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5. The method as recited in Claim 3, wherein the third register is 
comprised of 4 16-bit elements. 

6. The method as recited in Claim 1, wherein the starting byte is 
5 specified as a constant in an alignment instruction. 

7. The method as recited in Claim 1, wherein the starting byte is 
specified as a variable in a register in an alignment instruction. 

10 8. The method as recited in Claim 1, wherein the first vector and 

the second vector are in contiguous location in the memory unit. 

9. The method as recited in Claim 1, wherein the processor 
operates in a big-endian byte ordering mode. 

15 

10. The method as recited in Claim 1, wherein the processor 
operates in a little-endian byte ordering mode. 

11. In a computer system including a processor having a plurality of 
20 registers, a method for generating an ordered set of elements in an N-bit 

vector from two sets of elements in two N-bit vectors for single instruction 
multiple data (SIMD) vector processing, said method comprising the steps of: 

loading a first vector from a memory unit into a first register; 

loading a second vector from the memory unit into a second register; 
25 selecting a subset of elements from the first register and the second 

register; and 
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replicating the elements from the subset into the elements in the third 
register in a particular order suitable for subsequent SIMD vector processing. 

12. The method as recited in Claim 11 further comprising the step of 
5 storing the elements in the third register to the memory unit. 

13- The method as recited in Claim 11, wherein the first vector and 
the second vector are each comprised of 4 16-bit elements indexed from 0 to 3. 

10 14. The method as recited in Claim 11, wherein the first vector and 

the second vector are each comprised of 8 8-bit elements indexed from 0 to 7. 

15. The method as recited in Claim 13, wherein the subset is 
comprised of two elements from the first register and two elements from the 

15 second register. 

16. The method as recited in Claim 14, wherein the subset is 
comprised of four elements from the first register and four elements from the 
second register. 

20 

17. The method as recited in Claim 13, wherein the subset is 
comprised of the elements 2 and 3 from the first register and the elements 2 
and 3 from the second register. 

25 18. The method as recited in Claim 17, wherein the particular order 

of the elements in the third register comprises: 

the element 0 replicated from the element 2 of the second register; 
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the element 1 replicated from the element 2 of the first register; 

the element 2 replicated from the element 3 of the second register; and 

the element 3 replicated from the element 3 of the first register. 

19. The method as recited in Claim 13, wherein the subset is 
comprised of the elements 0 and 1 from the first register and the elements 0 
and 1 from the second register. 

20. The method as recited in Claim 19, wherein the particular order 
of the elements in the third register comprises: 

the element 0 replicated from the element 0 of the second register; 
the element 1 replicated from the element 0 of the first register; 
the element 2 replicated from the element 1 of the second register; and 
the element 3 replicated from the element 1 of the first register. 

21. The method as recited in Claim 13, wherein the subset is 
comprised of the elements 1 and 3 from the first register and the elements 1 
and 3 from the second register. 

22. The method as recited in Claim 21, wherein the particular order 
of the elements in the third register comprises: 

the element 0 replicated from the element 1 of the second register; 
the element 1 replicated from the element 3 of the second register; 
the element 2 replicated from the element 1 of the first register; and 
the element 3 replicated from the element 3 of the first register. 
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23. The method as recited in Claim 13, wherein the subset is 
comprised of the elements 0 and 2 from the first register and the elements 0 
and 2 from the second register. 

24 The method as recited in Claim 23, wherein the particvilar order 
of the elements in the third register comprises: 

the element 0 replicated from the element 0 of the second register; 
the element 1 replicated from the element 2 of the second register; 
the element 2 replicated from the element 0 of the first register; and 
the element 3 replicated from the element 2 of the first register, 

25. The method as recited in Claim 13, wherein the subset is 
comprised of the elements 0 and 2 from the first register and the elements 1 
and 3 from the second register. 

26- The method as recited in Claim 25, wherein the particular order 
of the elements in the third register comprises: 

the element 0 replicated from the element 1 of the second register; 
the element 1 replicated from the element 0 of the first register; 
the element 2 replicated from the element 3 of the second register; and 
the element 3 replicated from the element 2 of the first register. 

27. The method as recited in Claim 13, wherein the subset is 
comprised of the elements 0 and 2 from the first register and the elements 1 
and 3 from the second register. 
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28. The method as recited in Claim 27, wherein the particular order 
of the elements in the third register comprises: 

the element 0 replicated from the element 3 of the second register; 
the element 1 replicated from the element 2 of the first register; 
the element 2 replicated from the element 1 of the second register; and 
the element 3 replicated from the element 0 of the first register. 

29. The method as recited in Claim 13, wherein the subset is 
comprised of the elements 2 and 3 from the first register and the elements 2 
and 3 from the second register. 

30. The method as recited in Claim 29, wherein particular order of 
the elements in the third register comprises: 

the element 0 replicated from the element 2 of the second register; 
the element 1 replicated from the element 3 of the second register; 
the element 2 replicated from the element 2 of the first register; and 
the element 3 replicated from the element 3 of the first register. 

31. The method as recited in Claim 13, wherein the subset is 
comprised of the elements 0 and 2 from the first register and the elements 0 
and 1 from the second register. 

32- The method as recited in Claim 31, wherein the particular order 
of the elements in the third register comprises: 

the element 0 replicated from the element 0 of the second register; 
the element 1 replicated from the element 1 of the second register; 
the element 2 replicated from the element 0 of the first register; and 
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the element 3 replicated from the element 2 of the first register. 

33. The method as recited in Claim 14, wherein the subset is 
comprised of the elements 1, 3, 5, and 7 from the first register and the 
elements 1, 3, 5, and 7 from the second register. 

34. The method as recited in Claim 33, wherein the particular order 
of the elements in the third register comprises: 

the element 0 replicated from the element 1 of the second register; 
the element 1 replicated from the element 3 of the second register; 
the element 2 replicated from the element 5 of the second register; 
the element 3 replicated from the element 7 of the second register; . 
the element 4 replicated from the element 1 of the first register; 
the element 5 replicated from the element 3 of the first register; 
the element 6 replicated from the element 5 of the first register; and 
the element 7 replicated from the element 7 of the first register. 

35. The method as recited in Claim 14, wherein the subset is 
comprised of the elements 0, 2, 4, and 6 from the first register and the 
elements 0, 2, 4, and 6 from the second register, 

36. The method as recited in Claim 35, wherein the particular order 
of the elements in the third register comprises: 

the element 0 replicated from the element 0 of the second register; 
the element 1 replicated from the element 2 of the second register; 
the element 2 replicated from the element 4 of the second register; 
the element 3 replicated from the element 6 of the second register; 
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the element 4 replicated from the element 0 of the first register; 
the element 5 replicated from the element 2 of the first register; 
the element 6 replicated from the element 4 of the first register; and 
the element 7 replicated from the element 6 of the first register. 

5 

37. The method as recited in Claim 14, wherein the subset is 
comprised of the elements 4, 5, 6, and 7 from the first register and the 
elements 4, 5, 6, and 7 from the second register. 

i 

10 38. The method as recited in Claim 37, wherein the particular order 

of the elements in the third register comprises: 

the element 0 replicated from the element 4 of the second register; 

the element 1 replicated from the element 4 of the first register; 

the element 2 replicated from the element 5 of the second register; 
15 the element 3 replicated from the element 5 of the first register; 

the element 4 replicated from the element 6 of the second register; 

the element 5 replicated from the element 6 of the first register; 

the element 6 replicated from the element 7 of the second register; and 

the element 7 replicated from the element 7 of the first register. 

20 

39. The method as recited in Claim 14, wherein the subset is 
comprised of the elements 0, 1, 1, and 3 from the first register and the 
elements 0, 1, 2, and 3 from the second register. 

25 40. The method as recited in Claim 39, wherein the particular order 

of the elements in the third register comprises: 

the element 0 replicated from the element 0 of the second register; 
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the element 1 replicated from the element 0 of the first register; 
the element 2 replicated from the element 1 of the second register; 
the element 3 replicated from the element 1 of the first register; 
the element 4 replicated from the element 2 of the second register; 
5 the element 5 replicated from the element 2 of the first register; 

die element 6 replicated from the element 3 of the second register; and 
the element 7 replicated from the element 3 of the first register. 

41. The method as recited in Claim 14, wherein the subset is 
10 comprised of the elements 4, 5, 6, and 7 from the first register. 



42. The method as recited in Claim 41, wherein the particular order 
of the elements in the third register comprises: 

the element 0 replicated from the element 4 of the first register; 
15 the element 2 replicated from the element 5 of the first register; 

the element 4 replicated from the element 6 of the first register; 
the element 6 replicated from the element 7 of the first register; and 
the elements 1, 3, 5, and 7 containing a zero in all the bits. 

20 43. The method as recited in Claim 14, wherein the subset is 

comprised of the elements 0, 1, 2, and 3 from the first register. 

44. The method as recited in Claim 43, wherein the particular order 
of the elements in the third register comprises: 
25 the element 0 replicated from the element 0 of the first register; 

the element 2 replicated from the element 1 of the first register; 
the element 4 replicated from the element 2 of the first register; 
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the element 6 replicated from the element 3 of the first register; and 
the elements 1, 3, 5, and 7 containing a zero in all the bits. 

45. The method as recited in Claim 14, wherein the subset is 
comprised of the elements 4, 5, 6, and 7 from the first register. 

46. The method as recited in Claim 45, wherein the particular order 
of the elements in the third register comprises: 

the element 0 replicated from the element 4 of the first register; 

the element 1 replicating the sign bit of the element 4 of the first 
register in all the bits; 

the element 2 replicated from the element 5 of the first register; 

the element 3 replicating the sign bit of the element 5 of the first 
register in all the bits; 

the element 4 replicated from the element 6 of the first register; 

the element 5 containing the sign bit of the element 6 of the first 
register in all the bits; 

the element 6 replicated from the element 7 of the first register; and 

the element 7 containing the sign bit of the element 7 of the first 
register in all the bits. 

47. The method as recited in Claim 14, wherein the subset is 
comprised of the elements 0, 1, 2, and 3 from the first register. 

48. The method as recited in Claim 47, wherein the particular order 
of the elements in the third register comprises: 

the element 0 replicated from the element 0 of the first register; 
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the element 1 containing the sign bits of the element 0 of the first 
register; 

the element 2 replicated from the element 1 of the first register; 
the element 3 containing the sign bits of the element 1 of the first 
register; 

the element 4 replicated from the element 2 of the first register; 
the element 5 containing the sign bits of the element 2 of the first 
register; 

the element 6 replicated from the element 3 of the first register; and 
the element 7 containing the sign bits of the element 3 of the first 
register. 
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ABSTRACT 

The present invention provides aligrunent and ordering of vector 
elements for SIMD processing. In the aUgnment of vector elements for SIMD 
processing, one vector is loaded from a memory unit into a first register and 

5 another vector is loaded from the memory xmit into a second register. The 
first vector contains a first byte of an aligned vector to be generated. Then, a 
starting byte specifying the first byte of an aligned vector is determined. Next, 
a vector is extracted from the first register and the second register beginning 
from the first bit in the first byte of the first register continuing through the 

10 bits in the second register. Finally, the extracted vector is replicated into a 
third register such that the third register contains a plurality of elements 
aligned for SIMD processing. In the ordering of vector elements for SIMD 
processing, a first vector is loaded from a memory unit into a first register and 
a second vector is loaded from the memory unit into a second register. Then, 

15 a subset of elements are selected from the first register and the second register. 
The elements from the subset are then replicated into the elements in the 
third register in a particular order suitable for subsequent SIMD vector 
processing. 
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